home / github

Menu
  • GraphQL API
  • Search all tables

issues

Table actions
  • GraphQL API for issues

3 rows where state = "closed" and user = 25231875 sorted by updated_at descending

✎ View and edit SQL

This data as json, CSV (advanced)

Suggested facets: created_at (date), updated_at (date), closed_at (date)

type 1

  • issue 3

state 1

  • closed · 3 ✖

repo 1

  • xarray 3
id node_id number title user state locked assignee milestone comments created_at updated_at ▲ closed_at author_association active_lock_reason draft pull_request body reactions performed_via_github_app state_reason repo type
1445905299 I_kwDOAMm_X85WLsOT 7282 groupby and mean on a MultiIndex level raises ValueError jjpr-mit 25231875 closed 0     4 2022-11-11T19:15:58Z 2023-10-30T09:18:54Z 2023-08-31T03:50:33Z NONE      

What happened?

After using set_index to create a MultiIndex, calling groupby on a MultiIndex level and then mean raises an error.

What did you expect to happen?

Apply mean to groups, no error.

Minimal Complete Verifiable Example

Python d = DataArray( data=[ [0, 1, 2, 3, 4, 5, 6], [7, 8, 9, 10, 11, 12, 13], [14, 15, 16, 17, 18, 19, 20] ], coords={ "greek": ("a", ['alpha', 'beta', 'gamma']), "colors": ("a", ['red', 'green', 'blue']), "compass": ("b", ['north', 'south', 'east', 'west', 'northeast', 'southeast', 'southwest']), "integer": ("b", [0, 1, 2, 3, 4, 5, 6]), }, dims=("a", "b") ) d = d.set_index(a=['greek', 'colors'], b=['compass', 'integer']) g = d.groupby('greek') m = g.mean(...)

MVCE confirmation

  • [X] Minimal example — the example is as focused as reasonably possible to demonstrate the underlying issue in xarray.
  • [X] Complete example — the example is self-contained, including all data and the text of any traceback.
  • [X] Verifiable example — the example copy & pastes into an IPython prompt or Binder notebook, returning the result.
  • [X] New issue — a search of GitHub Issues suggests this is not a duplicate.

Relevant log output

Python Traceback (most recent call last): File "<stdin>", line 1, in <module> File "/usr/local/lib/python3.10/site-packages/xarray/core/_aggregations.py", line 5698, in mean return self.reduce( File "/usr/local/lib/python3.10/site-packages/xarray/core/groupby.py", line 1201, in reduce return self.map(reduce_array, shortcut=shortcut) File "/usr/local/lib/python3.10/site-packages/xarray/core/groupby.py", line 1104, in map return self._combine(applied, shortcut=shortcut) File "/usr/local/lib/python3.10/site-packages/xarray/core/groupby.py", line 1136, in _combine index, index_vars = create_default_index_implicit(coord) File "/usr/local/lib/python3.10/site-packages/xarray/core/indexes.py", line 1045, in create_default_index_implicit index = PandasMultiIndex(array, name) File "/usr/local/lib/python3.10/site-packages/xarray/core/indexes.py", line 615, in __init__ raise ValueError( ValueError: conflicting multi-index level name 'greek' with dimension 'greek'

Anything else we need to know?

No response

Environment

INSTALLED VERSIONS ------------------ commit: None python: 3.10.7 (main, Sep 13 2022, 14:31:33) [GCC 10.2.1 20210110] python-bits: 64 OS: Linux OS-release: 5.15.49-linuxkit machine: x86_64 processor: byteorder: little LC_ALL: None LANG: C.UTF-8 LOCALE: ('en_US', 'UTF-8') libhdf5: None libnetcdf: None xarray: 2022.11.0 pandas: 1.5.1 numpy: 1.23.4 scipy: None netCDF4: None pydap: None h5netcdf: None h5py: None Nio: None zarr: None cftime: None nc_time_axis: None PseudoNetCDF: None rasterio: None cfgrib: None iris: None bottleneck: None dask: None distributed: None matplotlib: None cartopy: None seaborn: None numbagg: None fsspec: None cupy: None pint: None sparse: None flox: None numpy_groupies: None setuptools: 63.2.0 pip: 22.2.2 conda: None pytest: None IPython: None sphinx: None
{
    "url": "https://api.github.com/repos/pydata/xarray/issues/7282/reactions",
    "total_count": 1,
    "+1": 1,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  completed xarray 13221727 issue
257070215 MDU6SXNzdWUyNTcwNzAyMTU= 1569 Grouping with multiple levels jjpr-mit 25231875 closed 0     6 2017-09-12T14:46:12Z 2022-04-09T15:25:07Z 2022-04-09T15:25:06Z NONE      

http://xarray.pydata.org/en/stable/groupby.html says:

xarray supports “group by” operations with the same API as pandas

but when I supply the level keyword argument as described at https://pandas.pydata.org/pandas-docs/stable/groupby.html#groupby-with-multiindex, I get:
``` TypeError Traceback (most recent call last) <ipython-input-12-566fc67c0151> in <module>() ----> 1 hvm_it_v6_obj = hvm_it_v6.groupby(level=["category","obj"]).mean(dim="presentation") 2 hvm_it_v6_obj

TypeError: groupby() got an unexpected keyword argument 'level' ```

{
    "url": "https://api.github.com/repos/pydata/xarray/issues/1569/reactions",
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  completed xarray 13221727 issue
329438885 MDU6SXNzdWUzMjk0Mzg4ODU= 2215 align() outer join returns DataArrays that are all NaNs jjpr-mit 25231875 closed 0     10 2018-06-05T12:42:53Z 2018-06-13T21:02:45Z 2018-06-13T21:02:44Z NONE      

Code Sample, a copy-pastable example if possible

The problem occurs for me in the midst of a data-processing pipeline that starts with some ~40MB netCDF files. I've tried to create pasteable code that reproduces the behavior from scratch, but I haven't succeeded.

Problem description

I pass two DataArrays to xr.align() with join="outer". The DataArrays are dtype float64, and contain a mix of NaNs and floats. They are 2D and have MultiIndexes with some numeric and some string levels.

The tuple of DataArrays returned by align() have the correct shape and expected indexes, but the contents of the arrays are all NaNs. The original float values are gone. np.nonzero(~np.isnan(da)) returns an empty array.

I've set breakpoints and delved into the code. On line 656 in xarray.core.variable.Variable._getitem_with_mask, self contains non-NaN values, but the data returned by as_indexable(self._data)[actual_indexer] evaluates as all NaNs. However, data.array at that point (which is xarray.backends.netCDF4_.NetCDF4ArrayWrapper) has non-NaNs. So it's some sort of masking caused by the indexing that makes it look like data is all NaNs.

Expected Output

A tuple of DataArrays which contain some non-NaN values.

Output of xr.show_versions()

INSTALLED VERSIONS ------------------ commit: None python: 3.6.4.final.0 python-bits: 64 OS: Linux OS-release: 4.4.0-116-generic machine: x86_64 processor: x86_64 byteorder: little LC_ALL: None LANG: en_US.UTF-8 LOCALE: en_US.UTF-8 xarray: 0.10.4 pandas: 0.22.0 numpy: 1.14.0 scipy: 1.0.0 netCDF4: 1.3.1 h5netcdf: None h5py: None Nio: None zarr: None bottleneck: None cyordereddict: None dask: None distributed: None matplotlib: 2.1.2 cartopy: None seaborn: None setuptools: 38.4.0 pip: 9.0.1 conda: None pytest: 3.3.2 IPython: 6.2.1 sphinx: None
{
    "url": "https://api.github.com/repos/pydata/xarray/issues/2215/reactions",
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  completed xarray 13221727 issue

Advanced export

JSON shape: default, array, newline-delimited, object

CSV options:

CREATE TABLE [issues] (
   [id] INTEGER PRIMARY KEY,
   [node_id] TEXT,
   [number] INTEGER,
   [title] TEXT,
   [user] INTEGER REFERENCES [users]([id]),
   [state] TEXT,
   [locked] INTEGER,
   [assignee] INTEGER REFERENCES [users]([id]),
   [milestone] INTEGER REFERENCES [milestones]([id]),
   [comments] INTEGER,
   [created_at] TEXT,
   [updated_at] TEXT,
   [closed_at] TEXT,
   [author_association] TEXT,
   [active_lock_reason] TEXT,
   [draft] INTEGER,
   [pull_request] TEXT,
   [body] TEXT,
   [reactions] TEXT,
   [performed_via_github_app] TEXT,
   [state_reason] TEXT,
   [repo] INTEGER REFERENCES [repos]([id]),
   [type] TEXT
);
CREATE INDEX [idx_issues_repo]
    ON [issues] ([repo]);
CREATE INDEX [idx_issues_milestone]
    ON [issues] ([milestone]);
CREATE INDEX [idx_issues_assignee]
    ON [issues] ([assignee]);
CREATE INDEX [idx_issues_user]
    ON [issues] ([user]);
Powered by Datasette · Queries took 20.52ms · About: xarray-datasette