issues: 785329941
This data as json
id | node_id | number | title | user | state | locked | assignee | milestone | comments | created_at | updated_at | closed_at | author_association | active_lock_reason | draft | pull_request | body | reactions | performed_via_github_app | state_reason | repo | type |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
785329941 | MDU6SXNzdWU3ODUzMjk5NDE= | 4804 | Improve performance of xarray.corr() on big datasets | 37177103 | open | 0 | 9 | 2021-01-13T18:18:12Z | 2021-06-05T18:23:47Z | NONE | Is your feature request related to a problem? Please describe. I calculated correlation coefficients based on datasets with sizes between 90-180 GB using xarray and Dask distributed and experienced very low performance for the Describe the solution you'd like The problem became so annoying that I implemented my own function to calculate the correlation coefficient (thanks @willirath!), which is considerably more performant (especially for the big datasets!), because it only touches the full data once. I have uploaded a Jupyter notebook that shows the equivalence of the At the moment, I think, in terms of improving big data performance, a considerable improvement could be achieved by removing the |
{ "url": "https://api.github.com/repos/pydata/xarray/issues/4804/reactions", "total_count": 2, "+1": 2, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
13221727 | issue |