home / github / issues

Menu
  • Search all tables
  • GraphQL API

issues: 1236174701

This data as json

id node_id number title user state locked assignee milestone comments created_at updated_at closed_at author_association active_lock_reason draft pull_request body reactions performed_via_github_app state_reason repo type
1236174701 I_kwDOAMm_X85Jrodt 6610 Update GroupBy constructor for grouping by multiple variables, dask arrays 2448579 open 0     6 2022-05-15T03:17:54Z 2023-04-26T16:06:17Z   MEMBER      

What is your issue?

flox supports grouping by multiple variables (would fix #324, #1056) and grouping by dask variables (would fix #2852).

To enable this in GroupBy we need to update the constructor's signature to 1. Accept multiple "by" variables. 2. Accept "expected group labels" for grouping by dask variables (like bins for groupby_bins which already supports grouping by dask variables). This lets us construct the output coordinate without evaluating the dask variable. 3. We may also want to simultaneously group by a categorical variable (season) and bin by a continuous variable (air temperature). So we also need a way to indicate whether the "expected group labels" are "bin edges" or categories.


The signature in flox is (may be errors!) python xarray_reduce( obj: Dataset | DataArray, *by: DataArray | str, func: str | Aggregation, expected_groups: Sequence | np.ndarray | None = None, isbin: bool | Sequence[bool] = False, ... )

You would calculate that last example using flox as python xarray_reduce( ds, "season", "air_temperature", expected_groups=[None, np.arange(21, 30, 1)], isbin=[False, True], ... )

The use of expected_groups and isbin seems ugly to me (the names could also be better!)


I propose we update groupby's signature to 1. change group: DataArray | str to group: DataArray | str | Iterable[str] | Iterable[DataArray] 2. We could add a top-level xr.Bins object that wraps bin edges + any kwargs to be passed to pandas.cut. Note our current groupby_bins signature has a bunch of kwargs passed directly to pandas.cut. 3. Finally add groups: None | ArrayLike | xarray.Bins | Iterable[None | ArrayLike | xarray.Bins] to pass the "expected group labels". 1. If None, then groups will be auto-detected from non-dask group arrays (if None for a dask group, then raise error). 1. If xarray.Bins indicates binning by the appropriate variables 1. If ArrayLike treat as categorical. 1. groups is a little too similar to group so we should choose a better name. 1. The ordering of ArrayLike would let us fix #757 (pass the seasons in the order you want them in the output)

So then that example becomes python ds.groupby( ["season", "air_temperature"], # season is numpy, air_temperature is dask groups=[None, xr.Bins(np.arange(21, 30, 1), closed="right")], )

Thoughts?

{
    "url": "https://api.github.com/repos/pydata/xarray/issues/6610/reactions",
    "total_count": 7,
    "+1": 7,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
    13221727 issue

Links from other tables

  • 2 rows from issues_id in issues_labels
  • 6 rows from issue in issue_comments
Powered by Datasette · Queries took 0.694ms · About: xarray-datasette