home / github / issues

Menu
  • GraphQL API
  • Search all tables

issues: 68759727

This data as json

id node_id number title user state locked assignee milestone comments created_at updated_at closed_at author_association active_lock_reason draft pull_request body reactions performed_via_github_app state_reason repo type
68759727 MDU6SXNzdWU2ODc1OTcyNw== 392 Non-aggregating grouped operations on dask arrays are painfully slow to construct 1217238 closed 0     7 2015-04-15T18:45:28Z 2019-02-01T23:06:35Z 2019-02-01T23:06:35Z MEMBER      

These are both entirely lazy operations:

```

%time res = ds.groupby('time.month').mean('time') CPU times: user 142 ms, sys: 20.3 ms, total: 162 ms Wall time: 159 ms %time res = ds.groupby('time.month').apply(lambda x: x - x.mean()) CPU times: user 46.1 s, sys: 4.9 s, total: 51 s Wall time: 50.4 s ```

I suspect the issue (in part) is that _interleaved_concat_slow indexes out single elements from each dask array along the grouped axis prior to concatenating them together (unit tests for interleaved_concat can be found here). So we end up creating way too many small dask arrays.

Profiling results on slightly smaller data are in this gist.

It would be great if we could figure out a way to make this faster, because these sort of operations are a really nice show case for xray + dask.

CC @mrocklin in case you have any ideas.

{
    "url": "https://api.github.com/repos/pydata/xarray/issues/392/reactions",
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  completed 13221727 issue

Links from other tables

  • 2 rows from issues_id in issues_labels
  • 7 rows from issue in issue_comments
Powered by Datasette · Queries took 0.639ms · About: xarray-datasette