issues: 785329941
This data as json
| id | node_id | number | title | user | state | locked | assignee | milestone | comments | created_at | updated_at | closed_at | author_association | active_lock_reason | draft | pull_request | body | reactions | performed_via_github_app | state_reason | repo | type |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 785329941 | MDU6SXNzdWU3ODUzMjk5NDE= | 4804 | Improve performance of xarray.corr() on big datasets | 37177103 | open | 0 | 9 | 2021-01-13T18:18:12Z | 2021-06-05T18:23:47Z | NONE | Is your feature request related to a problem? Please describe. I calculated correlation coefficients based on datasets with sizes between 90-180 GB using xarray and Dask distributed and experienced very low performance for the Describe the solution you'd like The problem became so annoying that I implemented my own function to calculate the correlation coefficient (thanks @willirath!), which is considerably more performant (especially for the big datasets!), because it only touches the full data once. I have uploaded a Jupyter notebook that shows the equivalence of the At the moment, I think, in terms of improving big data performance, a considerable improvement could be achieved by removing the |
{
"url": "https://api.github.com/repos/pydata/xarray/issues/4804/reactions",
"total_count": 2,
"+1": 2,
"-1": 0,
"laugh": 0,
"hooray": 0,
"confused": 0,
"heart": 0,
"rocket": 0,
"eyes": 0
} |
13221727 | issue |