id,node_id,number,title,user,state,locked,assignee,milestone,comments,created_at,updated_at,closed_at,author_association,active_lock_reason,draft,pull_request,body,reactions,performed_via_github_app,state_reason,repo,type
1400949778,I_kwDOAMm_X85TgMwS,7139,xarray.open_dataset has issues if the dataset returned by the backend contains a multiindex,21131639,closed,0,,,5,2022-10-07T10:19:36Z,2022-10-12T20:17:28Z,2022-10-12T20:17:28Z,CONTRIBUTOR,,,,"### What happened?
As a follow up of this comment: https://github.com/pydata/xarray/issues/6752#issuecomment-1236756285 I'm currently trying to implement a custom `NetCDF4` backend that allows me to also handle multiindices when loading a NetCDF dataset using `xr.open_dataset`.
I'm using the following two functions to convert the dataset to a NetCDF compatible version and back again:
https://github.com/pydata/xarray/issues/1077#issuecomment-1101505074.
Here is a small code example:
### Creating the dataset
```python
import xarray as xr
import pandas
def create_multiindex(**kwargs):
return pandas.MultiIndex.from_arrays(list(kwargs.values()), names=kwargs.keys())
dataset = xr.Dataset()
dataset.coords[""observation""] = [""A"", ""B""]
dataset.coords[""wavelength""] = [0.4, 0.5, 0.6, 0.7]
dataset.coords[""stokes""] = [""I"", ""Q""]
dataset[""measurement""] = create_multiindex(
observation=[""A"", ""A"", ""B"", ""B""],
wavelength=[0.4, 0.5, 0.6, 0.7],
stokes=[""I"", ""Q"", ""I"", ""I""],
)
```
### Saving as NetCDF
```python
from cf_xarray import encode_multi_index_as_compress
patched = encode_multi_index_as_compress(dataset)
patched.to_netcdf(""multiindex.nc"")
```
### And loading again
```python
from cf_xarray import decode_compress_to_multi_index
loaded = xr.open_dataset(""multiindex.nc"")
loaded = decode_compress_to_multiindex(loaded)
assert loaded.equals(dataset) # works
```
### Custom Backend
While the manual patching for saving is currently still required, I tried to at least work around the added function call in `open_dataset` by creating a custom NetCDF Backend:
```python
# registered as netcdf4-multiindex backend in setup.py
class MultiindexNetCDF4BackendEntrypoint(NetCDF4BackendEntrypoint):
def open_dataset(self, *args, handle_multiindex=True, **kwargs):
ds = super().open_dataset(*args, **kwargs)
if handle_multiindex: # here is where the restore operation happens:
ds = decode_compress_to_multiindex(ds)
return ds
```
### The error
```python
>>> loaded = xr.open_dataset(""multiindex.nc"", engine=""netcdf4-multiindex"", handle_multiindex=True) # fails
File ~/.local/share/virtualenvs/test-oePfdNug/lib/python3.8/site-packages/xarray/core/variable.py:2795, in IndexVariable.data(self, data)
2793 @Variable.data.setter # type: ignore[attr-defined]
2794 def data(self, data):
-> 2795 raise ValueError(
2796 f""Cannot assign to the .data attribute of dimension coordinate a.k.a IndexVariable {self.name!r}. ""
2797 f""Please use DataArray.assign_coords, Dataset.assign_coords or Dataset.assign as appropriate.""
2798 )
ValueError: Cannot assign to the .data attribute of dimension coordinate a.k.a IndexVariable 'measurement'. Please use DataArray.assign_coords, Dataset.assign_coords or Dataset.assign as appropriate.
```
but this works:
```python
>>> loaded = xr.open_dataset(""multiindex.nc"", engine=""netcdf4-multiindex"", handle_multiindex=False)
>>> loaded = decode_compress_to_multiindex(loaded)
>>> assert loaded.equals(dataset)
```
So I'm guessing `xarray` is performing some operation on the dataset returned by the backend, and one of those leads to a failure if there is a multiindex already contained.
### What did you expect to happen?
I expected that it doesn't matter wheter `decode_compress_to_multi_index` is called inside the backend or afterwards, and the same dataset will be returned each time.
### Minimal Complete Verifiable Example
```Python
See above.
```
### MVCE confirmation
- [X] Minimal example — the example is as focused as reasonably possible to demonstrate the underlying issue in xarray.
- [X] Complete example — the example is self-contained, including all data and the text of any traceback.
- [X] Verifiable example — the example copy & pastes into an IPython prompt or [Binder notebook](https://mybinder.org/v2/gh/pydata/xarray/main?urlpath=lab/tree/doc/examples/blank_template.ipynb), returning the result.
- [X] New issue — a search of GitHub Issues suggests this is not a duplicate.
### Relevant log output
_No response_
### Anything else we need to know?
I'm also open to other suggestions how I could simplify the usage of multiindices, maybe there is an approach that doesn't require a custom backend at all?
### Environment
INSTALLED VERSIONS
------------------
commit: None
python: 3.8.10 (default, Jan 28 2022, 09:41:12)
[GCC 9.3.0]
python-bits: 64
OS: Linux
OS-release: 5.10.102.1-microsoft-standard-WSL2
machine: x86_64
processor: x86_64
byteorder: little
LC_ALL: None
LANG: C.UTF-8
LOCALE: ('en_US', 'UTF-8')
libhdf5: 1.10.5
libnetcdf: 4.6.3
xarray: 2022.9.0
pandas: 1.5.0
numpy: 1.23.3
scipy: 1.9.1
netCDF4: 1.5.4
pydap: None
h5netcdf: None
h5py: 3.7.0
Nio: None
zarr: None
cftime: 1.6.2
nc_time_axis: None
PseudoNetCDF: None
rasterio: 1.3.2
cfgrib: None
iris: None
bottleneck: None
dask: None
distributed: None
matplotlib: 3.6.0
cartopy: 0.19.0.post1
seaborn: None
numbagg: None
fsspec: None
cupy: None
pint: None
sparse: 0.13.0
flox: None
numpy_groupies: None
setuptools: 65.3.0
pip: 22.2.2
conda: None
pytest: 7.1.3
IPython: 8.5.0
sphinx: 4.5.0
","{""url"": ""https://api.github.com/repos/pydata/xarray/issues/7139/reactions"", ""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,completed,13221727,issue
1293460108,I_kwDOAMm_X85NGKKM,6752,MultiIndex listed multiple times in Dataset.indexes property,21131639,closed,0,,,8,2022-07-04T18:12:25Z,2022-09-05T12:21:54Z,2022-09-05T12:21:54Z,CONTRIBUTOR,,,,"### What happened?
When upgrading to 2022.6.0.rc0 from 2022.3.0 I noticed a possible unexpected breaking change in the Dataset.indexes property. MultiIndices are now listed for each dimension they apply for as well as once for the multi index itself when accessing `dataset.indexes`.
### What did you expect to happen?
Same behaviour as before, see example below.
### Minimal Complete Verifiable Example
```Python
# execute with 2022.3.0 and 2022.6.0.rc0 to see the differences
import pandas
import xarray as xr
def _create_multiindex(**kwargs):
return pandas.MultiIndex.from_arrays(list(kwargs.values()), names=kwargs.keys())
ds = xr.Dataset()
ds.coords[""measurement""] = _create_multiindex(
observation=[""A"", ""A"", ""B"", ""B""],
wavelength=[0.4, 0.5, 0.6, 0.7],
stokes=[""I"", ""Q"", ""I"", ""I""],
)
for name, idx in ds.indexes.items():
print(name, idx)
```
### MVCE confirmation
- [x] Minimal example — the example is as focused as reasonably possible to demonstrate the underlying issue in xarray.
- [x] Complete example — the example is self-contained, including all data and the text of any traceback.
- [x] Verifiable example — the example copy & pastes into an IPython prompt or [Binder notebook](https://mybinder.org/v2/gh/pydata/xarray/main?urlpath=lab/tree/doc/examples/blank_template.ipynb), returning the result.
- [x] New issue — a search of GitHub Issues suggests this is not a duplicate.
### Relevant log output
```Python
Output with version 2022.3.0:
measurement MultiIndex([('A', 0.4, 'I'),
('A', 0.5, 'Q'),
('B', 0.6, 'I'),
('B', 0.7, 'I')],
names=['observation', 'wavelength', 'stokes'])
Output with version 2022.6.0.rc0:
measurement MultiIndex([('A', 0.4, 'I'),
('A', 0.5, 'Q'),
('B', 0.6, 'I'),
('B', 0.7, 'I')],
name='measurement')
observation MultiIndex([('A', 0.4, 'I'),
('A', 0.5, 'Q'),
('B', 0.6, 'I'),
('B', 0.7, 'I')],
name='measurement')
wavelength MultiIndex([('A', 0.4, 'I'),
('A', 0.5, 'Q'),
('B', 0.6, 'I'),
('B', 0.7, 'I')],
name='measurement')
stokes MultiIndex([('A', 0.4, 'I'),
('A', 0.5, 'Q'),
('B', 0.6, 'I'),
('B', 0.7, 'I')],
name='measurement')
```
### Anything else we need to know?
_No response_
### Environment
INSTALLED VERSIONS
------------------
commit: None
python: 3.8.10 (default, Jan 28 2022, 09:41:12)
[GCC 9.3.0]
python-bits: 64
OS: Linux
OS-release: 5.10.102.1-microsoft-standard-WSL2
machine: x86_64
processor: x86_64
byteorder: little
LC_ALL: None
LANG: C.UTF-8
LOCALE: ('en_US', 'UTF-8')
libhdf5: 1.10.5
libnetcdf: 4.6.3
xarray: 2022.3.0
pandas: 1.4.3
numpy: 1.23.0
scipy: 1.9.0rc1
netCDF4: 1.5.4
pydap: None
h5netcdf: None
h5py: 3.7.0
Nio: None
zarr: None
cftime: 1.6.1
nc_time_axis: None
PseudoNetCDF: None
rasterio: 1.3b3
cfgrib: None
iris: None
bottleneck: None
dask: None
distributed: None
matplotlib: 3.5.2
cartopy: 0.19.0.post1
seaborn: None
numbagg: None
fsspec: None
cupy: None
pint: None
sparse: None
setuptools: 56.0.0
pip: 21.3.1
conda: None
pytest: 7.1.2
IPython: 8.4.0
sphinx: 4.5.0
INSTALLED VERSIONS
------------------
commit: None
python: 3.8.10 (default, Jan 28 2022, 09:41:12)
[GCC 9.3.0]
python-bits: 64
OS: Linux
OS-release: 5.10.102.1-microsoft-standard-WSL2
machine: x86_64
processor: x86_64
byteorder: little
LC_ALL: None
LANG: C.UTF-8
LOCALE: ('en_US', 'UTF-8')
libhdf5: 1.10.5
libnetcdf: 4.6.3
xarray: 2022.6.0rc0
pandas: 1.4.3
numpy: 1.23.0
scipy: 1.9.0rc1
netCDF4: 1.5.4
pydap: None
h5netcdf: None
h5py: 3.7.0
Nio: None
zarr: None
cftime: 1.6.1
nc_time_axis: None
PseudoNetCDF: None
rasterio: 1.3b3
cfgrib: None
iris: None
bottleneck: None
dask: None
distributed: None
matplotlib: 3.5.2
cartopy: 0.19.0.post1
seaborn: None
numbagg: None
fsspec: None
cupy: None
pint: None
sparse: None
flox: None
numpy_groupies: None
setuptools: 56.0.0
pip: 21.3.1
conda: None
pytest: 7.1.2
IPython: 8.4.0
sphinx: 4.5.0
","{""url"": ""https://api.github.com/repos/pydata/xarray/issues/6752/reactions"", ""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,completed,13221727,issue