home / github / issue_comments

Menu
  • Search all tables
  • GraphQL API

issue_comments: 711458608

This data as json

html_url issue_url id node_id user created_at updated_at author_association body reactions performed_via_github_app issue
https://github.com/pydata/xarray/issues/4473#issuecomment-711458608 https://api.github.com/repos/pydata/xarray/issues/4473 711458608 MDEyOklzc3VlQ29tbWVudDcxMTQ1ODYwOA== 5635139 2020-10-19T01:19:08Z 2020-10-19T01:19:08Z MEMBER

Here's a very quick POC:

```python from numpy_groupies.aggregate_numba import aggregate

def npg_groupby(da: xr.DataArray, dim, func='sum'): group_idx, labels = pd.factorize(da.indexes[dim]) axis = da.get_axis_num(dim) array = npg.aggregate(group_idx=group_idx, a=da, func=func, axis=axis) return array ```

Run on this array: ```python size_factor = 1000

da = xr.DataArray( np.arange(1440 * size_factor).reshape(45 * size_factor, 8, 4), dims=("x", "y", "z"), coords=dict(x=list(range(45)) * size_factor, y=[1, 2, 3, 4] * 2, z=[1, 2] * 2), ) ```

It's about 2x as fast, though only generates the numpy array:

```python %%timeit npg_groupby(da, 'x')

15 ms ± 130 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

```

```python %%timeit da.groupby('x').sum()

37.6 ms ± 244 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)

```

Any thoughts on any of: - What's the best way of reconstituting the coords etc, after npg produces the array? - Presumably we're going to have a fairly different design for this than the existing groupby operations — that design is very nested — wrapping functions and eventually calling .map to loop over each group in python. - Presumably we're going to need to keep the existing logic around for dask — is it reasonable for an initial version to defer to the existing logic for all dask arrays? (+ @shoyer 's thoughts above on this)

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  711626733
Powered by Datasette · Queries took 0.75ms · About: xarray-datasette