html_url,issue_url,id,node_id,user,created_at,updated_at,author_association,body,reactions,performed_via_github_app,issue https://github.com/pydata/xarray/issues/4473#issuecomment-1046935567,https://api.github.com/repos/pydata/xarray/issues/4473,1046935567,IC_kwDOAMm_X84-ZvgP,3497584,2022-02-21T14:25:47Z,2022-02-21T14:25:47Z,NONE,"Hi @shoyer, thanks for this neat trick! What happens when `bins` is a sequence of bin edges, rather than a number of bins? Your example seems to break and I'm not sure how to fix it. Thanks again!","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,711626733 https://github.com/pydata/xarray/issues/4473#issuecomment-711461331,https://api.github.com/repos/pydata/xarray/issues/4473,711461331,MDEyOklzc3VlQ29tbWVudDcxMTQ2MTMzMQ==,1217238,2020-10-19T01:30:48Z,2020-10-19T01:30:48Z,MEMBER,"> * What's the best way of reconstituting the coords etc, after npg produces the array? I think we can reuse the existing logic from the `_combine` method here: https://github.com/pydata/xarray/blob/97e26257e81b0ba35af4a34be43a3e9cc666b9bc/xarray/core/groupby.py#L830 This just gives us an alternative way to calculate `applied`. > * Presumably we're going to have a fairly different design for this than the existing groupby operations — that design is very nested — wrapping functions and eventually calling `.map` to loop over each group in python. Agreed. Hopefully this can live alongside in the GroupBy objects. > * Presumably we're going to need to keep the existing logic around for dask — is it reasonable for an initial version to defer to the existing logic for all dask arrays? (+ @shoyer 's thoughts above on this) Yes, I agree that we should do this incrementally.","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,711626733 https://github.com/pydata/xarray/issues/4473#issuecomment-711460703,https://api.github.com/repos/pydata/xarray/issues/4473,711460703,MDEyOklzc3VlQ29tbWVudDcxMTQ2MDcwMw==,1217238,2020-10-19T01:27:50Z,2020-10-19T01:27:50Z,MEMBER,Something like the resample test case from https://github.com/pydata/xarray/issues/4498 might be a good example for finding 100x speed-ups. The main feature of that case is that there are a _very_ large number of groups (only slightly fewer groups than original data points).,"{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,711626733 https://github.com/pydata/xarray/issues/4473#issuecomment-711458608,https://api.github.com/repos/pydata/xarray/issues/4473,711458608,MDEyOklzc3VlQ29tbWVudDcxMTQ1ODYwOA==,5635139,2020-10-19T01:19:08Z,2020-10-19T01:19:08Z,MEMBER,"Here's a very quick POC: ```python from numpy_groupies.aggregate_numba import aggregate def npg_groupby(da: xr.DataArray, dim, func='sum'): group_idx, labels = pd.factorize(da.indexes[dim]) axis = da.get_axis_num(dim) array = npg.aggregate(group_idx=group_idx, a=da, func=func, axis=axis) return array ``` Run on this array: ```python size_factor = 1000 da = xr.DataArray( np.arange(1440 * size_factor).reshape(45 * size_factor, 8, 4), dims=(""x"", ""y"", ""z""), coords=dict(x=list(range(45)) * size_factor, y=[1, 2, 3, 4] * 2, z=[1, 2] * 2), ) ``` It's about 2x as fast, though only generates the numpy array: ```python %%timeit npg_groupby(da, 'x') # 15 ms ± 130 µs per loop (mean ± std. dev. of 7 runs, 100 loops each) ``` ```python %%timeit da.groupby('x').sum() # 37.6 ms ± 244 µs per loop (mean ± std. dev. of 7 runs, 10 loops each) ``` Any thoughts on any of: - What's the best way of reconstituting the coords etc, after npg produces the array? - Presumably we're going to have a fairly different design for this than the existing groupby operations — that design is very nested — wrapping functions and eventually calling `.map` to loop over each group in python. - Presumably we're going to need to keep the existing logic around for dask — is it reasonable for an initial version to defer to the existing logic for all dask arrays? (+ @shoyer 's thoughts above on this)","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,711626733 https://github.com/pydata/xarray/issues/4473#issuecomment-702339435,https://api.github.com/repos/pydata/xarray/issues/4473,702339435,MDEyOklzc3VlQ29tbWVudDcwMjMzOTQzNQ==,5635139,2020-10-01T19:07:39Z,2020-10-01T19:07:39Z,MEMBER,"> I'm not entirely sure, but I suspect something like the approach in #4184 might be more directly relevant for speeding up `unstack` (at least with NumPy arrays). Great. I need to think through how to do that — the approach of using MultiIndex `codes` to index the array directly is very elegant — I'll try applying it to stack / unstack as a project.","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,711626733 https://github.com/pydata/xarray/issues/4473#issuecomment-701961598,https://api.github.com/repos/pydata/xarray/issues/4473,701961598,MDEyOklzc3VlQ29tbWVudDcwMTk2MTU5OA==,1217238,2020-10-01T07:57:58Z,2020-10-01T07:57:58Z,MEMBER,"> Highly speculative, but would this also be a faster approach to stacking & unstacking? ""Form ~5~ 4"" in the [readme](https://github.com/ml31415/numpy-groupies). I'm not entirely sure, but I suspect something like the approach in https://github.com/pydata/xarray/pull/4184 might be more directly relevant for speeding up `unstack` (at least with NumPy arrays).","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,711626733 https://github.com/pydata/xarray/issues/4473#issuecomment-701653835,https://api.github.com/repos/pydata/xarray/issues/4473,701653835,MDEyOklzc3VlQ29tbWVudDcwMTY1MzgzNQ==,5635139,2020-09-30T21:22:40Z,2020-10-01T05:56:13Z,MEMBER,"This looks amazing! Thanks for finding it. Highly speculative, but would this also be a faster approach to stacking & unstacking? ""Form ~5~ 4"" in the [readme](https://github.com/ml31415/numpy-groupies). ","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,711626733 https://github.com/pydata/xarray/issues/4473#issuecomment-701609035,https://api.github.com/repos/pydata/xarray/issues/4473,701609035,MDEyOklzc3VlQ29tbWVudDcwMTYwOTAzNQ==,1217238,2020-09-30T19:52:05Z,2020-09-30T19:52:05Z,MEMBER,A prototype implementation of the core functionality here can be found in: https://nbviewer.jupyter.org/gist/shoyer/6d6c82bbf383fb717cc8631869678737,"{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,711626733