home / github / issues

Menu
  • Search all tables
  • GraphQL API

issues: 711626733

This data as json

id node_id number title user state locked assignee milestone comments created_at updated_at closed_at author_association active_lock_reason draft pull_request body reactions performed_via_github_app state_reason repo type
711626733 MDU6SXNzdWU3MTE2MjY3MzM= 4473 Wrap numpy-groupies to speed up Xarray's groupby aggregations 1217238 closed 0     8 2020-09-30T04:43:04Z 2022-05-15T02:38:29Z 2022-05-15T02:38:29Z MEMBER      

Is your feature request related to a problem? Please describe.

Xarray's groupby aggregations (e.g., groupby(..).sum()) are very slow compared to pandas, as described in https://github.com/pydata/xarray/issues/659.

Describe the solution you'd like

We could speed things up considerably (easily 100x) by wrapping the numpy-groupies package.

Additional context

One challenge is how to handle dask arrays (and other duck arrays). In some cases it might make sense to apply the numpy-groupies function (using apply_ufunc), but in other cases it might be better to stick with the current indexing + concatenate solution. We could either pick some simple heuristics for choosing the algorithm to use on dask arrays, or could just stick with the current algorithm for now.

In particular, it might make sense to stick with the current algorithm if there are a many chunks in the arrays to aggregated along the "grouped" dimension (depending on the size of the unique group values).

{
    "url": "https://api.github.com/repos/pydata/xarray/issues/4473/reactions",
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  completed 13221727 issue

Links from other tables

  • 1 row from issues_id in issues_labels
  • 8 rows from issue in issue_comments
Powered by Datasette · Queries took 0.706ms · About: xarray-datasette