home / github

Menu
  • GraphQL API
  • Search all tables

issue_comments

Table actions
  • GraphQL API for issue_comments

6 rows where issue = 1236174701 sorted by updated_at descending

✎ View and edit SQL

This data as json, CSV (advanced)

Suggested facets: reactions, created_at (date), updated_at (date)

user 4

  • dcherian 3
  • shoyer 1
  • malmans2 1
  • TomNicholas 1

author_association 2

  • MEMBER 5
  • CONTRIBUTOR 1

issue 1

  • Update GroupBy constructor for grouping by multiple variables, dask arrays · 6 ✖
id html_url issue_url node_id user created_at updated_at ▲ author_association body reactions performed_via_github_app issue
1523666774 https://github.com/pydata/xarray/issues/6610#issuecomment-1523666774 https://api.github.com/repos/pydata/xarray/issues/6610 IC_kwDOAMm_X85a0U9W dcherian 2448579 2023-04-26T15:59:06Z 2023-04-26T16:06:17Z MEMBER

We voted to move forward with this API: python data.groupby({ "x0": xr.BinGrouper(bins=pd.IntervalIndex.from_breaks(coords["x_vertices"])), # binning "y": xr.UniqueGrouper(labels=["a", "b", "c"]), # categorical, data.y is dask-backed "time": xr.TimeResampleGrouper(freq="MS") }, )

We won't break backwards-compatibility for da.groupby(other_data_array) but for any complicated use-cases with Grouper the user must add the by variable to the xarray object, and refer to it by name in the dictionary as above,

{
    "total_count": 4,
    "+1": 2,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 1,
    "eyes": 1
}
  Update GroupBy constructor for grouping by multiple variables, dask arrays 1236174701
1498463195 https://github.com/pydata/xarray/issues/6610#issuecomment-1498463195 https://api.github.com/repos/pydata/xarray/issues/6610 IC_kwDOAMm_X85ZULvb dcherian 2448579 2023-04-06T04:07:05Z 2023-04-26T15:52:21Z MEMBER

Here's a question.

In #7561, I implement Grouper objects that don't have any information of the variable we're grouping by. So the future API would be:

python data.groupby({ "x0": xr.BinGrouper(bins=pd.IntervalIndex.from_breaks(coords["x_vertices"])), # binning "y": xr.UniqueGrouper(labels=["a", "b", "c"]), # categorical, data.y is dask-backed "time": xr.TimeResampleGrouper(freq="MS") }, )

Does this look OK or do we want to support passing the DataArray or variable name as a by kwarg:
python xr.BinGrouper(by="x0", bins=pd.IntervalIndex.from_breaks(coords["x_vertices"]))

This syntax would support passing DataArray in by so xr.UniqueGrouper(by=data.y) for example. Is that an important usecase to support? In #7561, I create new ResolvedGrouper objects that do contain by as a DataArray always, so it's really a question of exposing that to the user.

PS: Pandas has a key kwarg for a column name. So following that would mean

python data.groupby([ xr.BinGrouper("x0", bins=pd.IntervalIndex.from_breaks(coords["x_vertices"])), # binning xr.UniqueGrouper("y", labels=["a", "b", "c"]), # categorical, data.y is dask-backed xr.TimeResampleGrouper("time", freq="MS") ], )

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Update GroupBy constructor for grouping by multiple variables, dask arrays 1236174701
1341296800 https://github.com/pydata/xarray/issues/6610#issuecomment-1341296800 https://api.github.com/repos/pydata/xarray/issues/6610 IC_kwDOAMm_X85P8pCg shoyer 1217238 2022-12-07T17:12:05Z 2022-12-07T17:12:05Z MEMBER

I also like the idea of creating specific Grouper objects for different types of selection, e.g., UniqueGrouper (the default), BinGrouper, TimeResampleGrouper, etc.

{
    "total_count": 3,
    "+1": 3,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Update GroupBy constructor for grouping by multiple variables, dask arrays 1236174701
1341289782 https://github.com/pydata/xarray/issues/6610#issuecomment-1341289782 https://api.github.com/repos/pydata/xarray/issues/6610 IC_kwDOAMm_X85P8nU2 TomNicholas 35968931 2022-12-07T17:07:08Z 2022-12-07T17:07:08Z MEMBER

Using xr.Grouper has the advantage that you don't have to start guessing about whether or not the user wanted some complicated behaviour (especially if their input is slightly wrong somehow and you have to raise an informative error). Simple defaults would get left as is and complex use cases can be explicit and opt-in.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Update GroupBy constructor for grouping by multiple variables, dask arrays 1236174701
1329680642 https://github.com/pydata/xarray/issues/6610#issuecomment-1329680642 https://api.github.com/repos/pydata/xarray/issues/6610 IC_kwDOAMm_X85PQVEC dcherian 2448579 2022-11-28T19:58:29Z 2022-11-28T23:23:42Z MEMBER

In https://github.com/xarray-contrib/flox/issues/191 @keewis proposes a much nicer API for multiple variables:

python data.groupby( xr.Grouper(by="x", bins=pd.IntervalIndex.from_breaks(coords["x_vertices"])), # binning xr.Grouper(by=data.y, labels=["a", "b", "c"]), # categorical, data.y is dask-backed xr.Grouper(by="time", freq="MS"), # resample )

Note pd.Grouper uses key instead of by so that's a possibility too.

{
    "total_count": 2,
    "+1": 2,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Update GroupBy constructor for grouping by multiple variables, dask arrays 1236174701
1128588208 https://github.com/pydata/xarray/issues/6610#issuecomment-1128588208 https://api.github.com/repos/pydata/xarray/issues/6610 IC_kwDOAMm_X85DROOw malmans2 22245117 2022-05-17T08:40:04Z 2022-05-17T15:04:04Z CONTRIBUTOR

I'm getting errors with multi-indexes and flox. Is this expected and related to this issue, or should I open a separate issue?

```python import numpy as np

import xarray as xr

ds = xr.Dataset( dict(a=(("z",), np.ones(10))), coords=dict(b=(("z"), np.arange(2).repeat(5)), c=(("z"), np.arange(5).repeat(2))), ).set_index(bc=["b", "c"]) grouped = ds.groupby("bc")

with xr.set_options(use_flox=False): grouped.sum() # OK

with xr.set_options(use_flox=True): grouped.sum() # Error Traceback (most recent call last): File "/Users/mattia/MyGit/test.py", line 15, in <module> grouped.sum() File "/Users/mattia/MyGit/xarray/xarray/core/_reductions.py", line 2763, in sum return self._flox_reduce( File "/Users/mattia/MyGit/xarray/xarray/core/groupby.py", line 661, in _flox_reduce result = xarray_reduce( File "/Users/mattia/mambaforge/envs/sarsen_dev/lib/python3.10/site-packages/flox/xarray.py", line 373, in xarray_reduce actual[k] = v.expand_dims(missing_group_dims) File "/Users/mattia/MyGit/xarray/xarray/core/dataset.py", line 1427, in setitem self.update({key: value}) File "/Users/mattia/MyGit/xarray/xarray/core/dataset.py", line 4432, in update merge_result = dataset_update_method(self, other) File "/Users/mattia/MyGit/xarray/xarray/core/merge.py", line 1070, in dataset_update_method return merge_core( File "/Users/mattia/MyGit/xarray/xarray/core/merge.py", line 722, in merge_core aligned = deep_align( File "/Users/mattia/MyGit/xarray/xarray/core/alignment.py", line 824, in deep_align aligned = align( File "/Users/mattia/MyGit/xarray/xarray/core/alignment.py", line 761, in align aligner.align() File "/Users/mattia/MyGit/xarray/xarray/core/alignment.py", line 550, in align self.assert_unindexed_dim_sizes_equal() File "/Users/mattia/MyGit/xarray/xarray/core/alignment.py", line 450, in assert_unindexed_dim_sizes_equal raise ValueError( ValueError: cannot reindex or align along dimension 'bc' because of conflicting dimension sizes: {10, 6} (note: an index is found along that dimension with size=10) ```

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Update GroupBy constructor for grouping by multiple variables, dask arrays 1236174701

Advanced export

JSON shape: default, array, newline-delimited, object

CSV options:

CREATE TABLE [issue_comments] (
   [html_url] TEXT,
   [issue_url] TEXT,
   [id] INTEGER PRIMARY KEY,
   [node_id] TEXT,
   [user] INTEGER REFERENCES [users]([id]),
   [created_at] TEXT,
   [updated_at] TEXT,
   [author_association] TEXT,
   [body] TEXT,
   [reactions] TEXT,
   [performed_via_github_app] TEXT,
   [issue] INTEGER REFERENCES [issues]([id])
);
CREATE INDEX [idx_issue_comments_issue]
    ON [issue_comments] ([issue]);
CREATE INDEX [idx_issue_comments_user]
    ON [issue_comments] ([user]);
Powered by Datasette · Queries took 719.625ms · About: xarray-datasette