home / github / issues

Menu
  • Search all tables
  • GraphQL API

issues: 1400949778

This data as json

id node_id number title user state locked assignee milestone comments created_at updated_at closed_at author_association active_lock_reason draft pull_request body reactions performed_via_github_app state_reason repo type
1400949778 I_kwDOAMm_X85TgMwS 7139 xarray.open_dataset has issues if the dataset returned by the backend contains a multiindex 21131639 closed 0     5 2022-10-07T10:19:36Z 2022-10-12T20:17:28Z 2022-10-12T20:17:28Z CONTRIBUTOR      

What happened?

As a follow up of this comment: https://github.com/pydata/xarray/issues/6752#issuecomment-1236756285 I'm currently trying to implement a custom NetCDF4 backend that allows me to also handle multiindices when loading a NetCDF dataset using xr.open_dataset.

I'm using the following two functions to convert the dataset to a NetCDF compatible version and back again: https://github.com/pydata/xarray/issues/1077#issuecomment-1101505074.

Here is a small code example:

Creating the dataset

```python import xarray as xr import pandas

def create_multiindex(**kwargs): return pandas.MultiIndex.from_arrays(list(kwargs.values()), names=kwargs.keys())

dataset = xr.Dataset() dataset.coords["observation"] = ["A", "B"] dataset.coords["wavelength"] = [0.4, 0.5, 0.6, 0.7] dataset.coords["stokes"] = ["I", "Q"] dataset["measurement"] = create_multiindex( observation=["A", "A", "B", "B"], wavelength=[0.4, 0.5, 0.6, 0.7], stokes=["I", "Q", "I", "I"], ) ```

Saving as NetCDF

python from cf_xarray import encode_multi_index_as_compress patched = encode_multi_index_as_compress(dataset) patched.to_netcdf("multiindex.nc")

And loading again

python from cf_xarray import decode_compress_to_multi_index loaded = xr.open_dataset("multiindex.nc") loaded = decode_compress_to_multiindex(loaded) assert loaded.equals(dataset) # works

Custom Backend

While the manual patching for saving is currently still required, I tried to at least work around the added function call in open_dataset by creating a custom NetCDF Backend:

```python

registered as netcdf4-multiindex backend in setup.py

class MultiindexNetCDF4BackendEntrypoint(NetCDF4BackendEntrypoint): def open_dataset(self, args, handle_multiindex=True, kwargs): ds = super().open_dataset(args, **kwargs)

    if handle_multiindex:  # here is where the restore operation happens:
        ds = decode_compress_to_multiindex(ds)

    return ds

```

The error

```python

loaded = xr.open_dataset("multiindex.nc", engine="netcdf4-multiindex", handle_multiindex=True) # fails

File ~/.local/share/virtualenvs/test-oePfdNug/lib/python3.8/site-packages/xarray/core/variable.py:2795, in IndexVariable.data(self, data) 2793 @Variable.data.setter # type: ignore[attr-defined] 2794 def data(self, data): -> 2795 raise ValueError( 2796 f"Cannot assign to the .data attribute of dimension coordinate a.k.a IndexVariable {self.name!r}. " 2797 f"Please use DataArray.assign_coords, Dataset.assign_coords or Dataset.assign as appropriate." 2798 )

ValueError: Cannot assign to the .data attribute of dimension coordinate a.k.a IndexVariable 'measurement'. Please use DataArray.assign_coords, Dataset.assign_coords or Dataset.assign as appropriate. ```

but this works: ```python

loaded = xr.open_dataset("multiindex.nc", engine="netcdf4-multiindex", handle_multiindex=False) loaded = decode_compress_to_multiindex(loaded) assert loaded.equals(dataset) ```

So I'm guessing xarray is performing some operation on the dataset returned by the backend, and one of those leads to a failure if there is a multiindex already contained.

What did you expect to happen?

I expected that it doesn't matter wheter decode_compress_to_multi_index is called inside the backend or afterwards, and the same dataset will be returned each time.

Minimal Complete Verifiable Example

Python See above.

MVCE confirmation

  • [X] Minimal example — the example is as focused as reasonably possible to demonstrate the underlying issue in xarray.
  • [X] Complete example — the example is self-contained, including all data and the text of any traceback.
  • [X] Verifiable example — the example copy & pastes into an IPython prompt or Binder notebook, returning the result.
  • [X] New issue — a search of GitHub Issues suggests this is not a duplicate.

Relevant log output

No response

Anything else we need to know?

I'm also open to other suggestions how I could simplify the usage of multiindices, maybe there is an approach that doesn't require a custom backend at all?

Environment

INSTALLED VERSIONS ------------------ commit: None python: 3.8.10 (default, Jan 28 2022, 09:41:12) [GCC 9.3.0] python-bits: 64 OS: Linux OS-release: 5.10.102.1-microsoft-standard-WSL2 machine: x86_64 processor: x86_64 byteorder: little LC_ALL: None LANG: C.UTF-8 LOCALE: ('en_US', 'UTF-8') libhdf5: 1.10.5 libnetcdf: 4.6.3 xarray: 2022.9.0 pandas: 1.5.0 numpy: 1.23.3 scipy: 1.9.1 netCDF4: 1.5.4 pydap: None h5netcdf: None h5py: 3.7.0 Nio: None zarr: None cftime: 1.6.2 nc_time_axis: None PseudoNetCDF: None rasterio: 1.3.2 cfgrib: None iris: None bottleneck: None dask: None distributed: None matplotlib: 3.6.0 cartopy: 0.19.0.post1 seaborn: None numbagg: None fsspec: None cupy: None pint: None sparse: 0.13.0 flox: None numpy_groupies: None setuptools: 65.3.0 pip: 22.2.2 conda: None pytest: 7.1.3 IPython: 8.5.0 sphinx: 4.5.0
{
    "url": "https://api.github.com/repos/pydata/xarray/issues/7139/reactions",
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  completed 13221727 issue

Links from other tables

  • 2 rows from issues_id in issues_labels
  • 5 rows from issue in issue_comments
Powered by Datasette · Queries took 80.877ms · About: xarray-datasette