id,node_id,number,title,user,state,locked,assignee,milestone,comments,created_at,updated_at,closed_at,author_association,active_lock_reason,draft,pull_request,body,reactions,performed_via_github_app,state_reason,repo,type 1236174701,I_kwDOAMm_X85Jrodt,6610,"Update GroupBy constructor for grouping by multiple variables, dask arrays",2448579,open,0,,,6,2022-05-15T03:17:54Z,2023-04-26T16:06:17Z,,MEMBER,,,,"### What is your issue? `flox` supports grouping by multiple variables (would fix #324, #1056) and grouping by dask variables (would fix #2852). To enable this in GroupBy we need to update the constructor's signature to 1. Accept multiple ""by"" variables. 2. Accept ""expected group labels"" for grouping by dask variables (like `bins` for `groupby_bins` which already supports grouping by dask variables). This lets us construct the output coordinate without evaluating the dask variable. 3. We may also want to simultaneously group by a categorical variable (season) and bin by a continuous variable (air temperature). So we also need a way to indicate whether the ""expected group labels"" are ""bin edges"" or categories. ----- The signature in flox is (may be errors!) ``` python xarray_reduce( obj: Dataset | DataArray, *by: DataArray | str, func: str | Aggregation, expected_groups: Sequence | np.ndarray | None = None, isbin: bool | Sequence[bool] = False, ... ) ``` You would calculate that last example using flox as ``` python xarray_reduce( ds, ""season"", ""air_temperature"", expected_groups=[None, np.arange(21, 30, 1)], isbin=[False, True], ... ) ``` The use of `expected_groups` and `isbin` seems ugly to me (the names could also be better!) ------- I propose we update [groupby's signature](https://docs.xarray.dev/en/stable/generated/xarray.DataArray.groupby.html) to 1. change `group: DataArray | str` to `group: DataArray | str | Iterable[str] | Iterable[DataArray]` 2. We could add a top-level `xr.Bins` object that wraps bin edges + any kwargs to be passed to `pandas.cut`. Note our current [groupby_bins](https://docs.xarray.dev/en/stable/generated/xarray.DataArray.groupby_bins.html) signature has a bunch of kwargs passed directly to pandas.cut. 3. Finally add `groups: None | ArrayLike | xarray.Bins | Iterable[None | ArrayLike | xarray.Bins]` to pass the ""expected group labels"". 1. If `None`, then groups will be auto-detected from non-dask `group` arrays (if `None` for a dask `group`, then raise error). 1. If `xarray.Bins` indicates binning by the appropriate variables 1. If `ArrayLike` treat as categorical. 1. `groups` is a little too similar to `group` so we should choose a better name. 1. The ordering of `ArrayLike` would let us fix #757 (pass the seasons in the order you want them in the output) So then that example becomes ``` python ds.groupby( [""season"", ""air_temperature""], # season is numpy, air_temperature is dask groups=[None, xr.Bins(np.arange(21, 30, 1), closed=""right"")], ) ``` Thoughts? ","{""url"": ""https://api.github.com/repos/pydata/xarray/issues/6610/reactions"", ""total_count"": 7, ""+1"": 7, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,,13221727,issue