issues: 68759727
This data as json
id | node_id | number | title | user | state | locked | assignee | milestone | comments | created_at | updated_at | closed_at | author_association | active_lock_reason | draft | pull_request | body | reactions | performed_via_github_app | state_reason | repo | type |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
68759727 | MDU6SXNzdWU2ODc1OTcyNw== | 392 | Non-aggregating grouped operations on dask arrays are painfully slow to construct | 1217238 | closed | 0 | 7 | 2015-04-15T18:45:28Z | 2019-02-01T23:06:35Z | 2019-02-01T23:06:35Z | MEMBER | These are both entirely lazy operations: ```
I suspect the issue (in part) is that _interleaved_concat_slow indexes out single elements from each dask array along the grouped axis prior to concatenating them together (unit tests for interleaved_concat can be found here). So we end up creating way too many small dask arrays. Profiling results on slightly smaller data are in this gist. It would be great if we could figure out a way to make this faster, because these sort of operations are a really nice show case for xray + dask. CC @mrocklin in case you have any ideas. |
{ "url": "https://api.github.com/repos/pydata/xarray/issues/392/reactions", "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
completed | 13221727 | issue |