home / github

Menu
  • Search all tables
  • GraphQL API

issues

Table actions
  • GraphQL API for issues

2 rows where state = "closed", type = "issue" and user = 21131639 sorted by updated_at descending

✎ View and edit SQL

This data as json, CSV (advanced)

Suggested facets: created_at (date), updated_at (date), closed_at (date)

type 1

  • issue · 2 ✖

state 1

  • closed · 2 ✖

repo 1

  • xarray 2
id node_id number title user state locked assignee milestone comments created_at updated_at ▲ closed_at author_association active_lock_reason draft pull_request body reactions performed_via_github_app state_reason repo type
1400949778 I_kwDOAMm_X85TgMwS 7139 xarray.open_dataset has issues if the dataset returned by the backend contains a multiindex lukasbindreiter 21131639 closed 0     5 2022-10-07T10:19:36Z 2022-10-12T20:17:28Z 2022-10-12T20:17:28Z CONTRIBUTOR      

What happened?

As a follow up of this comment: https://github.com/pydata/xarray/issues/6752#issuecomment-1236756285 I'm currently trying to implement a custom NetCDF4 backend that allows me to also handle multiindices when loading a NetCDF dataset using xr.open_dataset.

I'm using the following two functions to convert the dataset to a NetCDF compatible version and back again: https://github.com/pydata/xarray/issues/1077#issuecomment-1101505074.

Here is a small code example:

Creating the dataset

```python import xarray as xr import pandas

def create_multiindex(**kwargs): return pandas.MultiIndex.from_arrays(list(kwargs.values()), names=kwargs.keys())

dataset = xr.Dataset() dataset.coords["observation"] = ["A", "B"] dataset.coords["wavelength"] = [0.4, 0.5, 0.6, 0.7] dataset.coords["stokes"] = ["I", "Q"] dataset["measurement"] = create_multiindex( observation=["A", "A", "B", "B"], wavelength=[0.4, 0.5, 0.6, 0.7], stokes=["I", "Q", "I", "I"], ) ```

Saving as NetCDF

python from cf_xarray import encode_multi_index_as_compress patched = encode_multi_index_as_compress(dataset) patched.to_netcdf("multiindex.nc")

And loading again

python from cf_xarray import decode_compress_to_multi_index loaded = xr.open_dataset("multiindex.nc") loaded = decode_compress_to_multiindex(loaded) assert loaded.equals(dataset) # works

Custom Backend

While the manual patching for saving is currently still required, I tried to at least work around the added function call in open_dataset by creating a custom NetCDF Backend:

```python

registered as netcdf4-multiindex backend in setup.py

class MultiindexNetCDF4BackendEntrypoint(NetCDF4BackendEntrypoint): def open_dataset(self, args, handle_multiindex=True, kwargs): ds = super().open_dataset(args, **kwargs)

    if handle_multiindex:  # here is where the restore operation happens:
        ds = decode_compress_to_multiindex(ds)

    return ds

```

The error

```python

loaded = xr.open_dataset("multiindex.nc", engine="netcdf4-multiindex", handle_multiindex=True) # fails

File ~/.local/share/virtualenvs/test-oePfdNug/lib/python3.8/site-packages/xarray/core/variable.py:2795, in IndexVariable.data(self, data) 2793 @Variable.data.setter # type: ignore[attr-defined] 2794 def data(self, data): -> 2795 raise ValueError( 2796 f"Cannot assign to the .data attribute of dimension coordinate a.k.a IndexVariable {self.name!r}. " 2797 f"Please use DataArray.assign_coords, Dataset.assign_coords or Dataset.assign as appropriate." 2798 )

ValueError: Cannot assign to the .data attribute of dimension coordinate a.k.a IndexVariable 'measurement'. Please use DataArray.assign_coords, Dataset.assign_coords or Dataset.assign as appropriate. ```

but this works: ```python

loaded = xr.open_dataset("multiindex.nc", engine="netcdf4-multiindex", handle_multiindex=False) loaded = decode_compress_to_multiindex(loaded) assert loaded.equals(dataset) ```

So I'm guessing xarray is performing some operation on the dataset returned by the backend, and one of those leads to a failure if there is a multiindex already contained.

What did you expect to happen?

I expected that it doesn't matter wheter decode_compress_to_multi_index is called inside the backend or afterwards, and the same dataset will be returned each time.

Minimal Complete Verifiable Example

Python See above.

MVCE confirmation

  • [X] Minimal example — the example is as focused as reasonably possible to demonstrate the underlying issue in xarray.
  • [X] Complete example — the example is self-contained, including all data and the text of any traceback.
  • [X] Verifiable example — the example copy & pastes into an IPython prompt or Binder notebook, returning the result.
  • [X] New issue — a search of GitHub Issues suggests this is not a duplicate.

Relevant log output

No response

Anything else we need to know?

I'm also open to other suggestions how I could simplify the usage of multiindices, maybe there is an approach that doesn't require a custom backend at all?

Environment

INSTALLED VERSIONS ------------------ commit: None python: 3.8.10 (default, Jan 28 2022, 09:41:12) [GCC 9.3.0] python-bits: 64 OS: Linux OS-release: 5.10.102.1-microsoft-standard-WSL2 machine: x86_64 processor: x86_64 byteorder: little LC_ALL: None LANG: C.UTF-8 LOCALE: ('en_US', 'UTF-8') libhdf5: 1.10.5 libnetcdf: 4.6.3 xarray: 2022.9.0 pandas: 1.5.0 numpy: 1.23.3 scipy: 1.9.1 netCDF4: 1.5.4 pydap: None h5netcdf: None h5py: 3.7.0 Nio: None zarr: None cftime: 1.6.2 nc_time_axis: None PseudoNetCDF: None rasterio: 1.3.2 cfgrib: None iris: None bottleneck: None dask: None distributed: None matplotlib: 3.6.0 cartopy: 0.19.0.post1 seaborn: None numbagg: None fsspec: None cupy: None pint: None sparse: 0.13.0 flox: None numpy_groupies: None setuptools: 65.3.0 pip: 22.2.2 conda: None pytest: 7.1.3 IPython: 8.5.0 sphinx: 4.5.0
{
    "url": "https://api.github.com/repos/pydata/xarray/issues/7139/reactions",
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  completed xarray 13221727 issue
1293460108 I_kwDOAMm_X85NGKKM 6752 MultiIndex listed multiple times in Dataset.indexes property lukasbindreiter 21131639 closed 0     8 2022-07-04T18:12:25Z 2022-09-05T12:21:54Z 2022-09-05T12:21:54Z CONTRIBUTOR      

What happened?

When upgrading to 2022.6.0.rc0 from 2022.3.0 I noticed a possible unexpected breaking change in the Dataset.indexes property. MultiIndices are now listed for each dimension they apply for as well as once for the multi index itself when accessing dataset.indexes.

What did you expect to happen?

Same behaviour as before, see example below.

Minimal Complete Verifiable Example

```Python

execute with 2022.3.0 and 2022.6.0.rc0 to see the differences

import pandas import xarray as xr

def _create_multiindex(**kwargs): return pandas.MultiIndex.from_arrays(list(kwargs.values()), names=kwargs.keys())

ds = xr.Dataset() ds.coords["measurement"] = _create_multiindex( observation=["A", "A", "B", "B"], wavelength=[0.4, 0.5, 0.6, 0.7], stokes=["I", "Q", "I", "I"], )

for name, idx in ds.indexes.items(): print(name, idx) ```

MVCE confirmation

  • [x] Minimal example — the example is as focused as reasonably possible to demonstrate the underlying issue in xarray.
  • [x] Complete example — the example is self-contained, including all data and the text of any traceback.
  • [x] Verifiable example — the example copy & pastes into an IPython prompt or Binder notebook, returning the result.
  • [x] New issue — a search of GitHub Issues suggests this is not a duplicate.

Relevant log output

```Python Output with version 2022.3.0:

measurement MultiIndex([('A', 0.4, 'I'), ('A', 0.5, 'Q'), ('B', 0.6, 'I'), ('B', 0.7, 'I')], names=['observation', 'wavelength', 'stokes'])

Output with version 2022.6.0.rc0:

measurement MultiIndex([('A', 0.4, 'I'), ('A', 0.5, 'Q'), ('B', 0.6, 'I'), ('B', 0.7, 'I')], name='measurement') observation MultiIndex([('A', 0.4, 'I'), ('A', 0.5, 'Q'), ('B', 0.6, 'I'), ('B', 0.7, 'I')], name='measurement') wavelength MultiIndex([('A', 0.4, 'I'), ('A', 0.5, 'Q'), ('B', 0.6, 'I'), ('B', 0.7, 'I')], name='measurement') stokes MultiIndex([('A', 0.4, 'I'), ('A', 0.5, 'Q'), ('B', 0.6, 'I'), ('B', 0.7, 'I')], name='measurement') ```

Anything else we need to know?

No response

Environment

INSTALLED VERSIONS ------------------ commit: None python: 3.8.10 (default, Jan 28 2022, 09:41:12) [GCC 9.3.0] python-bits: 64 OS: Linux OS-release: 5.10.102.1-microsoft-standard-WSL2 machine: x86_64 processor: x86_64 byteorder: little LC_ALL: None LANG: C.UTF-8 LOCALE: ('en_US', 'UTF-8') libhdf5: 1.10.5 libnetcdf: 4.6.3 xarray: 2022.3.0 pandas: 1.4.3 numpy: 1.23.0 scipy: 1.9.0rc1 netCDF4: 1.5.4 pydap: None h5netcdf: None h5py: 3.7.0 Nio: None zarr: None cftime: 1.6.1 nc_time_axis: None PseudoNetCDF: None rasterio: 1.3b3 cfgrib: None iris: None bottleneck: None dask: None distributed: None matplotlib: 3.5.2 cartopy: 0.19.0.post1 seaborn: None numbagg: None fsspec: None cupy: None pint: None sparse: None setuptools: 56.0.0 pip: 21.3.1 conda: None pytest: 7.1.2 IPython: 8.4.0 sphinx: 4.5.0
INSTALLED VERSIONS ------------------ commit: None python: 3.8.10 (default, Jan 28 2022, 09:41:12) [GCC 9.3.0] python-bits: 64 OS: Linux OS-release: 5.10.102.1-microsoft-standard-WSL2 machine: x86_64 processor: x86_64 byteorder: little LC_ALL: None LANG: C.UTF-8 LOCALE: ('en_US', 'UTF-8') libhdf5: 1.10.5 libnetcdf: 4.6.3 xarray: 2022.6.0rc0 pandas: 1.4.3 numpy: 1.23.0 scipy: 1.9.0rc1 netCDF4: 1.5.4 pydap: None h5netcdf: None h5py: 3.7.0 Nio: None zarr: None cftime: 1.6.1 nc_time_axis: None PseudoNetCDF: None rasterio: 1.3b3 cfgrib: None iris: None bottleneck: None dask: None distributed: None matplotlib: 3.5.2 cartopy: 0.19.0.post1 seaborn: None numbagg: None fsspec: None cupy: None pint: None sparse: None flox: None numpy_groupies: None setuptools: 56.0.0 pip: 21.3.1 conda: None pytest: 7.1.2 IPython: 8.4.0 sphinx: 4.5.0
{
    "url": "https://api.github.com/repos/pydata/xarray/issues/6752/reactions",
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  completed xarray 13221727 issue

Advanced export

JSON shape: default, array, newline-delimited, object

CSV options:

CREATE TABLE [issues] (
   [id] INTEGER PRIMARY KEY,
   [node_id] TEXT,
   [number] INTEGER,
   [title] TEXT,
   [user] INTEGER REFERENCES [users]([id]),
   [state] TEXT,
   [locked] INTEGER,
   [assignee] INTEGER REFERENCES [users]([id]),
   [milestone] INTEGER REFERENCES [milestones]([id]),
   [comments] INTEGER,
   [created_at] TEXT,
   [updated_at] TEXT,
   [closed_at] TEXT,
   [author_association] TEXT,
   [active_lock_reason] TEXT,
   [draft] INTEGER,
   [pull_request] TEXT,
   [body] TEXT,
   [reactions] TEXT,
   [performed_via_github_app] TEXT,
   [state_reason] TEXT,
   [repo] INTEGER REFERENCES [repos]([id]),
   [type] TEXT
);
CREATE INDEX [idx_issues_repo]
    ON [issues] ([repo]);
CREATE INDEX [idx_issues_milestone]
    ON [issues] ([milestone]);
CREATE INDEX [idx_issues_assignee]
    ON [issues] ([assignee]);
CREATE INDEX [idx_issues_user]
    ON [issues] ([user]);
Powered by Datasette · Queries took 1135.386ms · About: xarray-datasette