id,node_id,number,title,user,state,locked,assignee,milestone,comments,created_at,updated_at,closed_at,author_association,active_lock_reason,draft,pull_request,body,reactions,performed_via_github_app,state_reason,repo,type 1001197796,I_kwDOAMm_X847rRDk,5804,vectorized groupby binary ops,2448579,closed,0,,,1,2021-09-20T17:04:47Z,2022-03-29T07:11:28Z,2022-03-29T07:11:28Z,MEMBER,,,,"By switching to `numpy_groupies` we are vectorizing our groupby reductions. I think we can do the same for groupby's binary ops. Here's an example array ``` python import numpy as np import xarray as xr %load_ext memory_profiler N = 4 * 2000 da = xr.DataArray( np.random.random((N, N)), dims=(""x"", ""y""), coords={""labels"": (""x"", np.repeat([""a"", ""b"", ""c"", ""d"", ""e"", ""f"", ""g"", ""h""], repeats=N//8))}, ) ``` Consider this ""anomaly"" calculation, anomaly defined relative to the group mean ``` python def anom_current(da): grouped = da.groupby(""labels"") mean = grouped.mean() anom = grouped - mean return anom ``` With this approach, we loop over each group and apply the binary operation: https://github.com/pydata/xarray/blob/a1635d324753588e353e4e747f6058936fa8cf1e/xarray/core/computation.py#L502-L525 This saves some memory, but becomes slow for large number of groups. We could instead do ``` def anom_vectorized(da): mean = da.groupby(""labels"").mean() mean_expanded = mean.sel(labels=da.labels) anom = da - mean_expanded return anom ``` Now we are faster, but construct an extra array as big as the original array (I think this is an OK tradeoff). ``` %timeit anom_current(da) # 1.4 s ± 20.5 ms per loop (mean ± std. dev. of 7 runs, 1 loop each) %timeit anom_vectorized(da) # 937 ms ± 5.26 ms per loop (mean ± std. dev. of 7 runs, 1 loop each) ``` ------ (I haven't experimented with dask yet, so the following is just a theory). I think the real benefit comes with dask. Depending on where the groups are located relative to chunking, we could end up creating a lot of tiny chunks by splitting up existing chunks. With the vectorized approach we can do better. Ideally we would reindex the ""mean"" dask array with a numpy-array-of-repeated-ints such that the chunking of `mean_expanded` exactly matches the chunking of `da` along the grouped dimension. ~In practice, [dask.array.take](https://docs.dask.org/en/latest/_modules/dask/array/routines.html#take) doesn't allow specifying ""output chunks"" so we'd end up chunking ""mean_expanded"" based on dask's automatic heuristics, and then rechunking again for the binary operation.~ Thoughts? cc @rabernat","{""url"": ""https://api.github.com/repos/pydata/xarray/issues/5804/reactions"", ""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,completed,13221727,issue