id,node_id,number,title,user,state,locked,assignee,milestone,comments,created_at,updated_at,closed_at,author_association,active_lock_reason,draft,pull_request,body,reactions,performed_via_github_app,state_reason,repo,type
2090281639,I_kwDOAMm_X858lyqn,8628,objects remain unserializable after reset_index ,16033750,closed,0,4160723,,1,2024-01-19T11:03:56Z,2024-01-31T17:42:30Z,2024-01-31T17:42:30Z,NONE,,,,"### What happened?
With the 2024.1 release, I am unable to write objects to netCDF after having stacked dimensions with `.stack()` and called `.reset_index()` to get rid of the multi-index
### What did you expect to happen?
_No response_
### Minimal Complete Verifiable Example
```Python
import numpy as np
import xarray as xr
da = xr.DataArray(np.zeros([2, 3]), dims=[""x"", ""y""])
da = da.stack(point=(""x"", ""y""))
da = da.reset_index(""point"")
da.to_netcdf(""test.nc"")
```
### MVCE confirmation
- [X] Minimal example — the example is as focused as reasonably possible to demonstrate the underlying issue in xarray.
- [X] Complete example — the example is self-contained, including all data and the text of any traceback.
- [X] Verifiable example — the example copy & pastes into an IPython prompt or [Binder notebook](https://mybinder.org/v2/gh/pydata/xarray/main?urlpath=lab/tree/doc/examples/blank_template.ipynb), returning the result.
- [X] New issue — a search of GitHub Issues suggests this is not a duplicate.
- [X] Recent environment — the issue occurs with the latest version of xarray and its dependencies.
### Relevant log output
```Python
86 def ensure_not_multiindex(var: Variable, name: T_Name = None) -> None:
87 if isinstance(var._data, indexing.PandasMultiIndexingAdapter):
---> 88 raise NotImplementedError(
89 f""variable {name!r} is a MultiIndex, which cannot yet be ""
90 ""serialized. Instead, either use reset_index() ""
91 ""to convert MultiIndex levels into coordinate variables instead ""
92 ""or use https://cf-xarray.readthedocs.io/en/latest/coding.html.""
93 )
NotImplementedError: variable 'x' is a MultiIndex, which cannot yet be serialized. Instead, either use reset_index() to convert MultiIndex levels into coordinate variables instead or use https://cf-xarray.readthedocs.io/en/latest/coding.html.
```
### Anything else we need to know?
Creating the stacked object from scratch and saving it to netCDF works fine. The difference is that `type(da.x.variable._data)` is `xarray.core.indexing.PandasMultiIndexingAdapter` if it was stacked and reset and `numpy.ndarray` if it's created from scratch
### Environment
INSTALLED VERSIONS
------------------
commit: None
python: 3.11.7 | packaged by conda-forge | (main, Dec 23 2023, 14:43:09) [GCC 12.3.0]
python-bits: 64
OS: Linux
OS-release: 5.15.133.1-microsoft-standard-WSL2
machine: x86_64
processor: x86_64
byteorder: little
LC_ALL: None
LANG: C.UTF-8
LOCALE: ('en_US', 'UTF-8')
libhdf5: 1.14.3
libnetcdf: 4.9.2
xarray: 2024.1.0
pandas: 2.1.4
numpy: 1.26.3
scipy: 1.11.4
netCDF4: 1.6.5
pydap: None
h5netcdf: 1.2.0
h5py: 3.10.0
Nio: None
zarr: 2.16.1
cftime: 1.6.3
nc_time_axis: None
iris: None
bottleneck: 1.3.7
dask: 2024.1.0
distributed: None
matplotlib: 3.8.2
cartopy: 0.22.0
seaborn: 0.13.1
numbagg: None
fsspec: 2023.12.2
cupy: None
pint: 0.23
sparse: None
flox: None
numpy_groupies: None
setuptools: 69.0.3
pip: 23.3.2
conda: None
pytest: 7.4.4
mypy: None
IPython: 8.20.0
","{""url"": ""https://api.github.com/repos/pydata/xarray/issues/8628/reactions"", ""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,completed,13221727,issue
1318992926,I_kwDOAMm_X85Onjwe,6836,groupby(multi-index level) not working correctly on a multi-indexed DataArray or DataSet,5643062,closed,0,4160723,,9,2022-07-27T04:06:59Z,2023-06-06T00:21:32Z,2023-06-06T00:21:32Z,NONE,,,,"### What happened?
run the code block below with `2022.6.0`
```
midx = pd.MultiIndex.from_product([list(""abc""), [0, 1]], names=(""one"", ""two""))
mda = xr.DataArray(np.random.rand(6, 3), [(""x"", midx), (""y"", range(3))])
mda.groupby(""one"").groups
```
output:
```
In [15]: mda.groupby(""one"").groups
Out[15]:
{('a', 0): [0],
('a', 1): [1],
('b', 0): [2],
('b', 1): [3],
('c', 0): [4],
('c', 1): [5]}
```
### What did you expect to happen?
as it was with `2022.3.0`
```
In [6]: mda.groupby(""one"").groups
Out[6]: {'a': [0, 1], 'b': [2, 3], 'c': [4, 5]}
```
### Minimal Complete Verifiable Example
```Python
import pandas as pd
import numpy as np
import xarray as XR
midx = pd.MultiIndex.from_product([list(""abc""), [0, 1]], names=(""one"", ""two""))
mda = xr.DataArray(np.random.rand(6, 3), [(""x"", midx), (""y"", range(3))])
mda.groupby(""one"").groups
```
### MVCE confirmation
- [X] Minimal example — the example is as focused as reasonably possible to demonstrate the underlying issue in xarray.
- [X] Complete example — the example is self-contained, including all data and the text of any traceback.
- [X] Verifiable example — the example copy & pastes into an IPython prompt or [Binder notebook](https://mybinder.org/v2/gh/pydata/xarray/main?urlpath=lab/tree/doc/examples/blank_template.ipynb), returning the result.
- [X] New issue — a search of GitHub Issues suggests this is not a duplicate.
### Relevant log output
```Python
N/A
```
### Anything else we need to know?
N/A
### Environment
INSTALLED VERSIONS
------------------
commit: None
python: 3.8.10 (default, Mar 15 2022, 12:22:08)
[GCC 9.4.0]
python-bits: 64
OS: Linux
OS-release: 5.11.0-1025-aws
machine: x86_64
processor: x86_64
byteorder: little
LC_ALL: None
LANG: C.UTF-8
LOCALE: ('en_US', 'UTF-8')
libhdf5: 1.12.0
libnetcdf: 4.7.4
xarray: 2022.6.0
pandas: 1.4.3
numpy: 1.22.4
scipy: 1.7.3
netCDF4: 1.5.8
pydap: None
h5netcdf: None
h5py: None
Nio: None
zarr: None
cftime: 1.5.1.1
nc_time_axis: None
PseudoNetCDF: None
rasterio: 1.2.10
cfgrib: None
iris: None
bottleneck: 1.3.2
dask: 2022.04.1
distributed: 2022.4.1
matplotlib: 3.5.1
cartopy: 0.20.3
seaborn: 0.11.2
numbagg: None
fsspec: 2022.01.0
cupy: None
pint: None
sparse: None
flox: None
numpy_groupies: None
setuptools: 45.2.0
pip: 22.2
conda: None
pytest: 7.1.2
IPython: 7.31.0
sphinx: None
","{""url"": ""https://api.github.com/repos/pydata/xarray/issues/6836/reactions"", ""total_count"": 3, ""+1"": 3, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,completed,13221727,issue
1389148779,I_kwDOAMm_X85SzLpr,7097,Broken state when using assign_coords with multiindex,114576287,closed,0,4160723,,2,2022-09-28T10:51:34Z,2022-09-29T00:27:38Z,2022-09-28T18:02:17Z,NONE,,,,"### What happened?
I was trying to assign coordinates on a dataset that had been created by using stack. After assigning the coordinates, the dataset was in a state where its length was coming out as less than zero, which caused all sorts of issues.
### What did you expect to happen?
I think the issue is with the updating of `_coord_names`, perhaps in https://github.com/pydata/xarray/blob/18454c218002e48e1643ce8e25654262e5f592ad/xarray/core/coordinates.py#L389.
I expected to just be able to assign the coords and then print the array to see the result.
### Minimal Complete Verifiable Example
```Python
import xarray as xr
ds = xr.DataArray(
[[[1, 1], [0, 0]], [[2, 2], [1, 1]]],
dims=(""lat"", ""year"", ""month""),
coords={""lat"": [-60, 60], ""year"": [2010, 2020], ""month"": [3, 6]},
name=""test"",
).to_dataset()
stacked = ds.stack(time=(""year"", ""month""))
stacked = stacked.assign_coords(
{""time"": [y + m / 12 for y, m in stacked[""time""].values]}
)
# Both these fail with ValueError: __len__() should return >= 0
len(stacked)
print(stacked)
```
### MVCE confirmation
- [X] Minimal example — the example is as focused as reasonably possible to demonstrate the underlying issue in xarray.
- [X] Complete example — the example is self-contained, including all data and the text of any traceback.
- [x] Verifiable example — the example copy & pastes into an IPython prompt or [Binder notebook](https://mybinder.org/v2/gh/pydata/xarray/main?urlpath=lab/tree/doc/examples/blank_template.ipynb), returning the result.
- [X] New issue — a search of GitHub Issues suggests this is not a duplicate.
### Relevant log output
```Python
Traceback (most recent call last):
File ""mre.py"", line 17, in
len(stacked)
File "".../xarray-tests/xarray/core/dataset.py"", line 1364, in __len__
return len(self.data_vars)
ValueError: __len__() should return >= 0
```
### Anything else we need to know?
Here's a test (I put it in `test_dataarray.py` but maybe there is a better spot)
```python
def test_assign_coords_drop_coord_names(self) -> None:
ds = DataArray(
[[[1, 1], [0, 0]], [[2, 2], [1, 1]]],
dims=(""lat"", ""year"", ""month""),
coords={""lat"": [-60, 60], ""year"": [2010, 2020], ""month"": [3, 6]},
name=""test"",
).to_dataset()
stacked = ds.stack(time=(""year"", ""month""))
stacked = stacked.assign_coords(
{""time"": [y + m / 12 for y, m in stacked[""time""].values]}
)
# this seems to be handled correctly
assert set(stacked._variables.keys()) == {""test"", ""time"", ""lat""}
# however, _coord_names doesn't seem to update as expected
# the below fails
assert set(stacked._coord_names) == {""time"", ""lat""}
# the incorrect value of _coord_names means that all the below fails too
# The failure is because the length of a dataset is calculated as (via len(data_vars))
# len(dataset._variables) - len(dataset._coord_names). For the situation
# above, where len(dataset._coord_names) is greater than len(dataset._variables),
# you get a length less than zero which then fails because length must return
# a value greater than zero
# Both these fail with ValueError: __len__() should return >= 0
len(stacked)
print(stacked)
```
### Environment
INSTALLED VERSIONS
------------------
commit: e678a1d7884a3c24dba22d41b2eef5d7fe5258e7
python: 3.8.13 | packaged by conda-forge | (default, Mar 25 2022, 06:04:14)
[Clang 12.0.1 ]
python-bits: 64
OS: Darwin
OS-release: 21.5.0
machine: arm64
processor: arm
byteorder: little
LC_ALL: None
LANG: en_AU.UTF-8
LOCALE: ('en_AU', 'UTF-8')
libhdf5: 1.12.2
libnetcdf: 4.8.1
xarray: 0.1.dev4312+ge678a1d.d20220928
pandas: 1.5.0
numpy: 1.22.4
scipy: 1.9.1
netCDF4: 1.6.1
pydap: installed
h5netcdf: 1.0.2
h5py: 3.7.0
Nio: None
zarr: 2.13.2
cftime: 1.6.2
nc_time_axis: 1.4.1
PseudoNetCDF: 3.2.2
rasterio: 1.3.1
cfgrib: 0.9.10.1
iris: 3.3.0
bottleneck: 1.3.5
dask: 2022.9.1
distributed: 2022.9.1
matplotlib: 3.6.0
cartopy: 0.21.0
seaborn: 0.12.0
numbagg: 0.2.1
fsspec: 2022.8.2
cupy: None
pint: 0.19.2
sparse: 0.13.0
flox: 0.5.9
numpy_groupies: 0.9.19
setuptools: 65.4.0
pip: 22.2.2
conda: None
pytest: 7.1.3
IPython: None
sphinx: None
","{""url"": ""https://api.github.com/repos/pydata/xarray/issues/7097/reactions"", ""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,completed,13221727,issue
1355770800,I_kwDOAMm_X85Qz2uw,6969,"Regression on DataArray.unstack on v2022.06.0 : ""ValueError: IndexVariable objects must be 1-dimensional""",112489422,closed,0,4160723,,1,2022-08-30T13:25:16Z,2022-09-27T10:35:40Z,2022-09-27T10:35:40Z,NONE,,,,"### What happened?
**Please see code below**
With **xarray:2022.06.0**, DataArray.unstack raise an ValueError exception
```python-traceback
---------------------------------------------------------------------------
ValueError Traceback (most recent call last)
Input In [2], in ()
21 y = y.assign_coords(day=y.j + y.last_j)
22 y = y.set_index(multi=['sub_id', 'last_j'])
---> 24 y = y.unstack()
File /opt/conda/lib/python3.9/site-packages/xarray/core/dataarray.py:2402, in DataArray.unstack(self, dim, fill_value, sparse)
2342 def unstack(
2343 self,
2344 dim: Hashable | Sequence[Hashable] | None = None,
2345 fill_value: Any = dtypes.NA,
2346 sparse: bool = False,
2347 ) -> DataArray:
2348 """"""
2349 Unstack existing dimensions corresponding to MultiIndexes into
2350 multiple new dimensions.
(...)
2400 DataArray.stack
2401 """"""
-> 2402 ds = self._to_temp_dataset().unstack(dim, fill_value, sparse)
2403 return self._from_temp_dataset(ds)
File /opt/conda/lib/python3.9/site-packages/xarray/core/dataset.py:4656, in Dataset.unstack(self, dim, fill_value, sparse)
4652 result = result._unstack_full_reindex(
4653 dim, stacked_indexes[dim], fill_value, sparse
4654 )
4655 else:
-> 4656 result = result._unstack_once(
4657 dim, stacked_indexes[dim], fill_value, sparse
4658 )
4659 return result
File /opt/conda/lib/python3.9/site-packages/xarray/core/dataset.py:4492, in Dataset._unstack_once(self, dim, index_and_vars, fill_value, sparse)
4489 else:
4490 fill_value_ = fill_value
-> 4492 variables[name] = var._unstack_once(
4493 index=clean_index,
4494 dim=dim,
4495 fill_value=fill_value_,
4496 sparse=sparse,
4497 )
4498 else:
4499 variables[name] = var
File /opt/conda/lib/python3.9/site-packages/xarray/core/variable.py:1732, in Variable._unstack_once(self, index, dim, fill_value, sparse)
1727 # Indexer is a list of lists of locations. Each list is the locations
1728 # on the new dimension. This is robust to the data being sparse; in that
1729 # case the destinations will be NaN / zero.
1730 data[(..., *indexer)] = reordered
-> 1732 return self._replace(dims=new_dims, data=data)
File /opt/conda/lib/python3.9/site-packages/xarray/core/variable.py:985, in Variable._replace(self, dims, data, attrs, encoding)
983 if encoding is _default:
984 encoding = copy.copy(self._encoding)
--> 985 return type(self)(dims, data, attrs, encoding, fastpath=True)
File /opt/conda/lib/python3.9/site-packages/xarray/core/variable.py:2720, in IndexVariable.__init__(self, dims, data, attrs, encoding, fastpath)
2718 super().__init__(dims, data, attrs, encoding, fastpath)
2719 if self.ndim != 1:
-> 2720 raise ValueError(f""{type(self).__name__} objects must be 1-dimensional"")
2722 # Unlike in Variable, always eagerly load values into memory
2723 if not isinstance(self._data, PandasIndexingAdapter):
ValueError: IndexVariable objects must be 1-dimensional
```
### What did you expect to happen?
**Please see code below**
With **xarray:2022.03.0**, code runs well
### Minimal Complete Verifiable Example
```Python
import xarray as xr
import numpy as np
x = np.concatenate((np.repeat(np.nan,4), np.repeat(1,2))).reshape(3, 2).transpose()
x = xr.DataArray(
x,
coords = {
'composite_id': ['s00', 's10'],
'sub_id': ('composite_id', ['0', '1']),
'last_j': ('composite_id', [100, 111]),
'j': [-2,-1,0]
},
dims= ['composite_id', 'j']
)
y = x
y = y.stack({'multi': ['composite_id', 'j']})
y = y.dropna('multi')
y = y.assign_coords(day=y.j + y.last_j)
y = y.set_index(multi=['sub_id', 'last_j'])
y = y.unstack()
```
### MVCE confirmation
- [X] Minimal example — the example is as focused as reasonably possible to demonstrate the underlying issue in xarray.
- [X] Complete example — the example is self-contained, including all data and the text of any traceback.
- [X] Verifiable example — the example copy & pastes into an IPython prompt or [Binder notebook](https://mybinder.org/v2/gh/pydata/xarray/main?urlpath=lab/tree/doc/examples/blank_template.ipynb), returning the result.
- [x] New issue — a search of GitHub Issues suggests this is not a duplicate.
### Relevant log output
_No response_
### Anything else we need to know?
_No response_
### Environment
**Not working environment with xarray 2022.06.0**
INSTALLED VERSIONS
------------------
commit: None
python: 3.9.12 | packaged by conda-forge | (main, Mar 24 2022, 23:51:20)
[GCC 10.3.0]
python-bits: 64
OS: Linux
OS-release: 5.10.104-linuxkit
machine: aarch64
processor: aarch64
byteorder: little
LC_ALL: en_US.UTF-8
LANG: en_US.UTF-8
LOCALE: ('en_US', 'UTF-8')
libhdf5: None
libnetcdf: None
xarray: 2022.6.0
pandas: 1.4.3
numpy: 1.23.2
scipy: None
netCDF4: None
pydap: None
h5netcdf: None
h5py: None
Nio: None
zarr: None
cftime: None
nc_time_axis: None
PseudoNetCDF: None
rasterio: None
cfgrib: None
iris: None
bottleneck: None
dask: None
distributed: None
matplotlib: None
cartopy: None
seaborn: None
numbagg: None
fsspec: None
cupy: None
pint: None
sparse: None
flox: None
numpy_groupies: None
setuptools: 62.1.0
pip: 22.0.4
conda: 4.12.0
pytest: None
IPython: 8.3.0
sphinx: None
/opt/conda/lib/python3.9/site-packages/_distutils_hack/__init__.py:30: UserWarning: Setuptools is replacing distutils.
warnings.warn(""Setuptools is replacing distutils."")
**Working environment with xarray 2022.03.0**
INSTALLED VERSIONS
------------------
commit: None
python: 3.9.12 | packaged by conda-forge | (main, Mar 24 2022, 23:51:20)
[GCC 10.3.0]
python-bits: 64
OS: Linux
OS-release: 5.10.104-linuxkit
machine: aarch64
processor: aarch64
byteorder: little
LC_ALL: en_US.UTF-8
LANG: en_US.UTF-8
LOCALE: ('en_US', 'UTF-8')
libhdf5: None
libnetcdf: None
xarray: 2022.3.0
pandas: 1.4.3
numpy: 1.23.2
scipy: None
netCDF4: None
pydap: None
h5netcdf: None
h5py: None
Nio: None
zarr: None
cftime: None
nc_time_axis: None
PseudoNetCDF: None
rasterio: None
cfgrib: None
iris: None
bottleneck: None
dask: None
distributed: None
matplotlib: None
cartopy: None
seaborn: None
numbagg: None
fsspec: None
cupy: None
pint: None
sparse: None
setuptools: 62.1.0
pip: 22.0.4
conda: 4.12.0
pytest: None
IPython: 8.3.0
sphinx: None
/opt/conda/lib/python3.9/site-packages/_distutils_hack/__init__.py:30: UserWarning: Setuptools is replacing distutils.
warnings.warn(""Setuptools is replacing distutils."")
","{""url"": ""https://api.github.com/repos/pydata/xarray/issues/6969/reactions"", ""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,completed,13221727,issue
1347026292,I_kwDOAMm_X85QSf10,6946,reset_index not resetting levels of MultiIndex,20629530,closed,0,4160723,,3,2022-08-22T21:47:04Z,2022-09-27T10:35:39Z,2022-09-27T10:35:39Z,CONTRIBUTOR,,,,"### What happened?
I'm not sure my usecase is the simplest way to demonstrate the issue, but let's try anyway.
I have a DataArray with two coordinates and I stack them into a new multi-index. I want to pass the levels of that new multi-index into a function, but as dask arrays. Turns out, it is not straightforward to chunk these variables because they act like `IndexVariable` objects and refuse to be chunked.
Thus, I reset the multi-index, drop it, but the variables still don't want to be chunked!
### What did you expect to happen?
I expected the levels to be chunkable after the sequence : stack, reset_index.
### Minimal Complete Verifiable Example
```Python
import xarray as xr
ds = xr.tutorial.open_dataset('air_temperature')
ds = ds.stack(spatial=['lon', 'lat'])
ds = ds.reset_index('spatial', drop=True) # I don't think the drop is important here.
lon_chunked = ds.lon.chunk() # woups, doesn't do anything!
type(ds.lon.variable) # xarray.core.variable.IndexVariable # I assumed either the stack or the reset_index would have modified this type into a normal variable.
```
### MVCE confirmation
- [X] Minimal example — the example is as focused as reasonably possible to demonstrate the underlying issue in xarray.
- [X] Complete example — the example is self-contained, including all data and the text of any traceback.
- [X] Verifiable example — the example copy & pastes into an IPython prompt or [Binder notebook](https://mybinder.org/v2/gh/pydata/xarray/main?urlpath=lab/tree/doc/examples/blank_template.ipynb), returning the result.
- [X] New issue — a search of GitHub Issues suggests this is not a duplicate.
### Relevant log output
_No response_
### Anything else we need to know?
Seems kinda related to the issues around `reset_index`. I thinks this is related to (but not a duplicate of) #4366.
### Environment
INSTALLED VERSIONS
------------------
commit: None
python: 3.10.5 | packaged by conda-forge | (main, Jun 14 2022, 07:04:59) [GCC 10.3.0]
python-bits: 64
OS: Linux
OS-release: 3.10.0-1160.49.1.el7.x86_64
machine: x86_64
processor: x86_64
byteorder: little
LC_ALL: None
LANG: en_CA.UTF-8
LOCALE: ('en_CA', 'UTF-8')
libhdf5: 1.12.1
libnetcdf: 4.8.1
xarray: 2022.6.0
pandas: 1.4.3
numpy: 1.22.4
scipy: 1.9.0
netCDF4: 1.6.0
pydap: None
h5netcdf: None
h5py: 3.7.0
Nio: None
zarr: 2.12.0
cftime: 1.6.1
nc_time_axis: 1.4.1
PseudoNetCDF: None
rasterio: None
cfgrib: None
iris: None
bottleneck: 1.3.5
dask: 2022.8.0
distributed: 2022.8.0
matplotlib: 3.5.2
cartopy: 0.20.3
seaborn: None
numbagg: None
fsspec: 2022.7.1
cupy: None
pint: 0.19.2
sparse: 0.13.0
flox: 0.5.9
numpy_groupies: 0.9.19
setuptools: 63.4.2
pip: 22.2.2
conda: None
pytest: None
IPython: 8.4.0
sphinx: 5.1.1
","{""url"": ""https://api.github.com/repos/pydata/xarray/issues/6946/reactions"", ""total_count"": 2, ""+1"": 2, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,completed,13221727,issue
1352621981,I_kwDOAMm_X85Qn1-d,6959,Assigning coordinate level to MultiIndex fails if MultiIndex only has one level,20617032,closed,0,4160723,,0,2022-08-26T18:48:18Z,2022-09-27T10:35:39Z,2022-09-27T10:35:39Z,NONE,,,,"### What happened?
This issue originates from [this discussion ](https://github.com/pydata/xarray/discussions/6936) where I was trying to figure out the best way to replace coordinate values in a MultiIndex level. I found that removing the level with `reset_index` and replacing the coordinate level with `assign_coords` works except when removing level leaves you with a MultIndex with only one level. In this case a ValueError is thrown.
### What did you expect to happen?
I expect that removing and replacing a coordinate level would work the same independent of the number of levels in the MultiIndex.
### Minimal Complete Verifiable Example
```Python
import numpy as np
import pandas as pd
import xarray as xr
# Replace the coordinates in level 'one'. This works as expected.
midx = pd.MultiIndex.from_product([[0,1,2], [3, 4], [5,6]], names=(""one"", ""two"",""three""))
mda = xr.DataArray(np.random.rand(12, 3), [(""x"", midx), (""y"", range(3))])
new_coords = mda.coords['one'].values*2
mda.reset_index('one', drop=True).assign_coords(one= ('x',new_coords)).set_index(x='one',append=True) #Works
# Drop the two level before had such that the intermediate state has a multindex
# with only the 'three' level, this throws a ValueError
mda.reset_index('two',drop=True).reset_index('one', drop=True).assign_coords(one= ('x',new_coords)) #ValueError
# We can intialize a data array with only two levels and only drop the 'one'
# level, which gives the same ValueError. This shows that the problem is not
# due to something with dropping the 'two' level above, but something inherent
# to dropping to a state with only one multinddex level
midx = pd.MultiIndex.from_product([[0,1,2], [3, 4]], names=(""one"", ""two""))
mda = xr.DataArray(np.random.rand(6, 2), [(""x"", midx), (""y"", range(2))])
new_coords = mda.coords['one'].values*2
mda.reset_index('one', drop=True).assign_coords(one= ('x',new_coords)).set_index(x='one',append=True) #ValueError
```
### MVCE confirmation
- [X] Minimal example — the example is as focused as reasonably possible to demonstrate the underlying issue in xarray.
- [X] Complete example — the example is self-contained, including all data and the text of any traceback.
- [X] Verifiable example — the example copy & pastes into an IPython prompt or [Binder notebook](https://mybinder.org/v2/gh/pydata/xarray/main?urlpath=lab/tree/doc/examples/blank_template.ipynb), returning the result.
- [X] New issue — a search of GitHub Issues suggests this is not a duplicate.
### Relevant log output
```Python
# First example, starting from 3 level multiindex and dropping two levels
ValueError Traceback (most recent call last)
c:\Users\aspit\Git\Learn\xarray\replace_coord_issue.py in
15 # Drop the two level before had such that the intermediate state has a multindex
16 # with only the 'three' level, this throws a ValueError
---> 17 mda.reset_index('two',drop=True).reset_index('one', drop=True).assign_coords(one= ('x',new_coords))
c:\Users\aspit\anaconda3\envs\dataanalysis\lib\site-packages\xarray\core\common.py in assign_coords(self, coords, **coords_kwargs)
590 data = self.copy(deep=False)
591 results: dict[Hashable, Any] = self._calc_assign_results(coords_combined)
--> 592 data.coords.update(results)
593 return data
594
c:\Users\aspit\anaconda3\envs\dataanalysis\lib\site-packages\xarray\core\coordinates.py in update(self, other)
160 other_vars = getattr(other, ""variables"", other)
161 self._maybe_drop_multiindex_coords(set(other_vars))
--> 162 coords, indexes = merge_coords(
163 [self.variables, other_vars], priority_arg=1, indexes=self.xindexes
164 )
c:\Users\aspit\anaconda3\envs\dataanalysis\lib\site-packages\xarray\core\merge.py in merge_coords(objects, compat, join, priority_arg, indexes, fill_value)
564 collected = collect_variables_and_indexes(aligned)
565 prioritized = _get_priority_vars_and_indexes(aligned, priority_arg, compat=compat)
--> 566 variables, out_indexes = merge_collected(collected, prioritized, compat=compat)
567 return variables, out_indexes
568
c:\Users\aspit\anaconda3\envs\dataanalysis\lib\site-packages\xarray\core\merge.py in merge_collected(grouped, prioritized, compat, combine_attrs, equals)
252
253 _assert_compat_valid(compat)
--> 254 _assert_prioritized_valid(grouped, prioritized)
255
256 merged_vars: dict[Hashable, Variable] = {}
c:\Users\aspit\anaconda3\envs\dataanalysis\lib\site-packages\xarray\core\merge.py in _assert_prioritized_valid(grouped, prioritized)
199 common_names_str = "", "".join(f""{k!r}"" for k in common_names)
200 index_names_str = "", "".join(f""{k!r}"" for k in index_coord_names)
--> 201 raise ValueError(
202 f""cannot set or update variable(s) {common_names_str}, which would corrupt ""
203 f""the following index built from coordinates {index_names_str}:\n""
ValueError: cannot set or update variable(s) 'one', which would corrupt the following index built from coordinates 'x', 'one', 'three':
# Second Example Starting from two level multindex and dropping one level
ValueError Traceback (most recent call last)
c:\Users\aspit\Git\Learn\xarray\replace_coord_issue.py in
11
12 new_coords = mda.coords['one'].values*2
---> 13 mda.reset_index('one', drop=True).assign_coords(one= ('x',new_coords)).set_index(x='one',append=True)
c:\Users\aspit\anaconda3\envs\dataanalysis\lib\site-packages\xarray\core\common.py in assign_coords(self, coords, **coords_kwargs)
590 data = self.copy(deep=False)
591 results: dict[Hashable, Any] = self._calc_assign_results(coords_combined)
--> 592 data.coords.update(results)
593 return data
594
c:\Users\aspit\anaconda3\envs\dataanalysis\lib\site-packages\xarray\core\coordinates.py in update(self, other)
160 other_vars = getattr(other, ""variables"", other)
161 self._maybe_drop_multiindex_coords(set(other_vars))
--> 162 coords, indexes = merge_coords(
163 [self.variables, other_vars], priority_arg=1, indexes=self.xindexes
164 )
c:\Users\aspit\anaconda3\envs\dataanalysis\lib\site-packages\xarray\core\merge.py in merge_coords(objects, compat, join, priority_arg, indexes, fill_value)
564 collected = collect_variables_and_indexes(aligned)
565 prioritized = _get_priority_vars_and_indexes(aligned, priority_arg, compat=compat)
--> 566 variables, out_indexes = merge_collected(collected, prioritized, compat=compat)
567 return variables, out_indexes
568
c:\Users\aspit\anaconda3\envs\dataanalysis\lib\site-packages\xarray\core\merge.py in merge_collected(grouped, prioritized, compat, combine_attrs, equals)
252
253 _assert_compat_valid(compat)
--> 254 _assert_prioritized_valid(grouped, prioritized)
255
256 merged_vars: dict[Hashable, Variable] = {}
c:\Users\aspit\anaconda3\envs\dataanalysis\lib\site-packages\xarray\core\merge.py in _assert_prioritized_valid(grouped, prioritized)
199 common_names_str = "", "".join(f""{k!r}"" for k in common_names)
200 index_names_str = "", "".join(f""{k!r}"" for k in index_coord_names)
--> 201 raise ValueError(
202 f""cannot set or update variable(s) {common_names_str}, which would corrupt ""
203 f""the following index built from coordinates {index_names_str}:\n""
ValueError: cannot set or update variable(s) 'one', which would corrupt the following index built from coordinates 'x', 'one', 'two':
```
### Anything else we need to know?
_No response_
### Environment
INSTALLED VERSIONS
------------------
commit: None
python: 3.9.7 | packaged by conda-forge | (default, Sep 29 2021, 19:15:42) [MSC v.1916 64 bit (AMD64)]
python-bits: 64
OS: Windows
OS-release: 10
machine: AMD64
processor: Intel64 Family 6 Model 142 Stepping 12, GenuineIntel
byteorder: little
LC_ALL: None
LANG: None
LOCALE: ('English_United States', '1252')
libhdf5: 1.12.1
libnetcdf: 4.8.1
xarray: 2022.6.0
pandas: 1.3.4
numpy: 1.21.4
scipy: 1.7.3
netCDF4: 1.5.8
pydap: None
h5netcdf: 1.0.2
h5py: 3.7.0
Nio: None
zarr: None
cftime: 1.5.1.1
nc_time_axis: None
PseudoNetCDF: None
rasterio: None
cfgrib: None
iris: None
bottleneck: 1.3.5
dask: 2022.02.1
distributed: 2022.2.1
matplotlib: 3.4.3
cartopy: None
seaborn: None
numbagg: None
fsspec: 2022.7.1
cupy: None
pint: 0.18
sparse: None
flox: None
numpy_groupies: None
setuptools: 59.1.0
pip: 21.3.1
conda: None
pytest: 6.2.5
IPython: 7.29.0
sphinx: None
","{""url"": ""https://api.github.com/repos/pydata/xarray/issues/6959/reactions"", ""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,completed,13221727,issue
1361896826,I_kwDOAMm_X85RLOV6,6989,reset multi-index to single index (level): coordinate not renamed,4160723,closed,0,4160723,,0,2022-09-05T12:45:22Z,2022-09-27T10:35:39Z,2022-09-27T10:35:39Z,MEMBER,,,,"### What happened?
Resetting a multi-index to a single level (i.e., a single index) does not rename the remaining level coordinate to the dimension name.
### What did you expect to happen?
While it is certainly more consistent not to rename the level coordinate here (since an index can be assigned to a non-dimension coordinate now), it breaks from the old behavior. I think it's better not introduce any breaking change. As discussed elsewhere, we might eventually want to deprecate `reset_index` in favor of `drop_indexes` (#6971).
### Minimal Complete Verifiable Example
```Python
import pandas as pd
import xarray as xr
midx = pd.MultiIndex.from_product([[""a"", ""b""], [1, 2]], names=(""foo"", ""bar""))
ds = xr.Dataset(coords={""x"": midx})
#
# Dimensions: (x: 4)
# Coordinates:
# * x (x) object MultiIndex
# * foo (x) object 'a' 'a' 'b' 'b'
# * bar (x) int64 1 2 1 2
# Data variables:
# *empty*
rds = ds.reset_index(""foo"")
# v2022.03.0
#
#
# Dimensions: (x: 4)
# Coordinates:
# * x (x) int64 1 2 1 2
# foo (x) object 'a' 'a' 'b' 'b'
# Data variables:
# *empty*
# v2022.06.0
#
#
# Dimensions: (x: 4)
# Coordinates:
# foo (x) object 'a' 'a' 'b' 'b'
# * bar (x) int64 1 2 1 2
# Dimensions without coordinates: x
# Data variables:
# *empty*
```
### MVCE confirmation
- [X] Minimal example — the example is as focused as reasonably possible to demonstrate the underlying issue in xarray.
- [X] Complete example — the example is self-contained, including all data and the text of any traceback.
- [X] Verifiable example — the example copy & pastes into an IPython prompt or [Binder notebook](https://mybinder.org/v2/gh/pydata/xarray/main?urlpath=lab/tree/doc/examples/blank_template.ipynb), returning the result.
- [X] New issue — a search of GitHub Issues suggests this is not a duplicate.
### Relevant log output
_No response_
### Anything else we need to know?
_No response_
### Environment
","{""url"": ""https://api.github.com/repos/pydata/xarray/issues/6989/reactions"", ""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,completed,13221727,issue
626591460,MDU6SXNzdWU2MjY1OTE0NjA=,4107,renaming Variable to a dimension name does not convert to IndexVariable,2448579,closed,0,4160723,,0,2020-05-28T15:11:49Z,2022-09-27T09:33:42Z,2022-09-27T09:33:42Z,MEMBER,,,,"
Seen in #4103
#### MCVE Code Sample
```python
from xarray.tests import assert_identical
coord_1 = xr.DataArray([1, 2], dims=[""coord_1""], attrs={""attrs"": True})
da = xr.DataArray([1, 0], [coord_1])
obj = da.reset_index(""coord_1"").rename({""coord_1_"": ""coord_1""})
assert_identical(da, obj)
```
#### Expected Output
#### Problem Description
```
AssertionErrorTraceback (most recent call last)
in
----> 1 assert_identical(da, obj)
~/work/python/xarray/xarray/tests/__init__.py in assert_identical(a, b)
160 xarray.testing.assert_identical(a, b)
161 xarray.testing._assert_internal_invariants(a)
--> 162 xarray.testing._assert_internal_invariants(b)
163
164
~/work/python/xarray/xarray/testing.py in _assert_internal_invariants(xarray_obj)
265 _assert_variable_invariants(xarray_obj)
266 elif isinstance(xarray_obj, DataArray):
--> 267 _assert_dataarray_invariants(xarray_obj)
268 elif isinstance(xarray_obj, Dataset):
269 _assert_dataset_invariants(xarray_obj)
~/work/python/xarray/xarray/testing.py in _assert_dataarray_invariants(da)
210 assert all(
211 isinstance(v, IndexVariable) for (k, v) in da._coords.items() if v.dims == (k,)
--> 212 ), {k: type(v) for k, v in da._coords.items()}
213 for k, v in da._coords.items():
214 _assert_variable_invariants(v, k)
AssertionError: {'coord_1': }
```
#### Versions
Output of xr.show_versions()
","{""url"": ""https://api.github.com/repos/pydata/xarray/issues/4107/reactions"", ""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,completed,13221727,issue
1235725650,I_kwDOAMm_X85Jp61S,6607,Coordinate promotion workaround broken,20629530,closed,0,4160723,,4,2022-05-13T21:20:25Z,2022-09-27T09:33:41Z,2022-09-27T09:33:41Z,CONTRIBUTOR,,,,"### What happened?
Ok so this one is a bit weird. I'm not sure this is a bug, but code that worked before doesn't anymore, so it is some sort of regression.
I have a dataset with one dimension and one coordinate along that one, but they have different names. I want to transform this so that the coordinate name becomes the dimension name so it becomes are proper dimension-coordinate (I don't know how to call it). After renaming the dim to the coord's name, it all looks good in the repr, but the coord still is missing an `index` for that dimension (`crd.indexes` is empty, see MCVE). There was a workaround through `reset_coords` for this, but it doesn't work anymore.
Instead, the last line of the MCVE downgrades the variable, the final `lon` doesn't have coords anymore.
### What did you expect to happen?
In the MCVE below, I show what the old ""workaround"" was. I expected `lon.indexes` to contain the indexes `lon` at the end of the procedure.
### Minimal Complete Verifiable Example
```Python
import xarray as xr
# A dataset with a 1d variable along a dimension
ds = xr.Dataset({'lon': xr.DataArray([1, 2, 3], dims=('x',))})
# Promote to coord. This still is not a proper crd-dim (different name)
ds = ds.set_coords(['lon'])
# Rename dim:
ds = ds.rename(x='lon')
# Now do we have a proper coord-dim ? No. not yet because:
ds.indexes # is empty
# Workaround that was used up to the last release
lon = ds.lon.reset_coords(drop=True)
# Because of the missing indexes the next line fails on the master
lon - lon.diff('lon')
```
### MVCE confirmation
- [X] Minimal example — the example is as focused as reasonably possible to demonstrate the underlying issue in xarray.
- [X] Complete example — the example is self-contained, including all data and the text of any traceback.
- [X] Verifiable example — the example copy & pastes into an IPython prompt or [Binder notebook](https://mybinder.org/v2/gh/pydata/xarray/main?urlpath=lab/tree/doc/examples/blank_template.ipynb), returning the result.
- [x] New issue — a search of GitHub Issues suggests this is not a duplicate.
### Relevant log output
_No response_
### Anything else we need to know?
My guess is that this line is causing `reset_coords` to drop the coordinate from itself : https://github.com/pydata/xarray/blob/c34ef8a60227720724e90aa11a6266c0026a812a/xarray/core/dataarray.py#L866
It would be nice if the renaming was sufficient for the indexes to appear.
My example is weird I know. The real use case is a script where we receive a 2d coordinate but where all lines are the same, so we take the first line and promote it to a proper coord-dim. But the current code fails on the master on the `lon - lon.diff('lon')` step that happens afterwards.
### Environment
INSTALLED VERSIONS
------------------
commit: None
python: 3.9.12 | packaged by conda-forge | (main, Mar 24 2022, 23:22:55)
[GCC 10.3.0]
python-bits: 64
OS: Linux
OS-release: 5.13.19-2-MANJARO
machine: x86_64
processor:
byteorder: little
LC_ALL: None
LANG: fr_CA.UTF-8
LOCALE: ('fr_CA', 'UTF-8')
libhdf5: None
libnetcdf: None
xarray: 2022.3.1.dev104+gc34ef8a6
pandas: 1.4.2
numpy: 1.22.2
scipy: 1.8.0
netCDF4: None
pydap: installed
h5netcdf: None
h5py: None
Nio: None
zarr: None
cftime: 1.5.2
nc_time_axis: None
PseudoNetCDF: None
rasterio: None
cfgrib: None
iris: None
bottleneck: None
dask: 2022.02.1
distributed: 2022.2.1
matplotlib: None
cartopy: None
seaborn: None
numbagg: None
fsspec: 2022.3.0
cupy: None
pint: None
sparse: 0.13.0
setuptools: 59.8.0
pip: 22.0.3
conda: None
pytest: 7.0.1
IPython: 8.3.0
sphinx: None
","{""url"": ""https://api.github.com/repos/pydata/xarray/issues/6607/reactions"", ""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,completed,13221727,issue
1361626450,I_kwDOAMm_X85RKMVS,6987,Indexes.get_unique() TypeError with pandas indexes,4160723,closed,0,4160723,,0,2022-09-05T09:02:50Z,2022-09-23T07:30:39Z,2022-09-23T07:30:39Z,MEMBER,,,,"@benbovy I also just tested the `get_unique()` method that you mentioned and maybe noticed a related issue here, which I'm not sure is wanted / expected.
Taking the above dataset `ds`, accessing this function results in an error:
```python
> ds.indexes.get_unique()
TypeError: unhashable type: 'MultiIndex'
```
However, for `xindexes` it works:
```python
> ds.xindexes.get_unique()
[]
```
_Originally posted by @lukasbindreiter in https://github.com/pydata/xarray/issues/6752#issuecomment-1236717180_","{""url"": ""https://api.github.com/repos/pydata/xarray/issues/6987/reactions"", ""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,completed,13221727,issue
1047599975,I_kwDOAMm_X84-cRtn,5953,Selecting with MultiIndex raises a TypeError,34740232,closed,0,4160723,,1,2021-11-08T15:39:10Z,2022-03-17T17:11:43Z,2022-03-17T17:11:43Z,NONE,,,,"
**What happened**: After updating xarray to v0.20.1, some of my multi-index-based selection code raises a `TypeError`.
**What you expected to happen**: Selection should work in the case I am considering.
**Minimal Complete Verifiable Example**:
```python
import xarray as xr
import pandas as pd
da = xr.DataArray(data=[0, 1, 2, 3], dims=(""x"",)).reindex(
{""x"": pd.MultiIndex.from_product(([""foo"", ""bar""], [0, 1]), names=(""str"", ""int""))}
)
print(da.sel(x=(""foo"", 1)))
```
**Expected output** (what I get with xarray 0.19.0):
```
array(1)
Coordinates:
x object ('foo', 1)
```
**Actual output** (with xarray 0.20.1):
```
Traceback (most recent call last):
File ""playgrounds/xarray_mwe.py"", line 7, in
da.sel(x=(""foo"", 1))
File ""/Users/leroyv/miniforge3/envs/eradiate/lib/python3.7/site-packages/xarray/core/dataarray.py"", line 1337, in sel
**indexers_kwargs,
File ""/Users/leroyv/miniforge3/envs/eradiate/lib/python3.7/site-packages/xarray/core/dataset.py"", line 2505, in sel
self, indexers=indexers, method=method, tolerance=tolerance
File ""/Users/leroyv/miniforge3/envs/eradiate/lib/python3.7/site-packages/xarray/core/coordinates.py"", line 422, in remap_label_indexers
obj, v_indexers, method=method, tolerance=tolerance
File ""/Users/leroyv/miniforge3/envs/eradiate/lib/python3.7/site-packages/xarray/core/indexing.py"", line 120, in remap_label_indexers
idxr, new_idx = index.query(labels, method=method, tolerance=tolerance)
File ""/Users/leroyv/miniforge3/envs/eradiate/lib/python3.7/site-packages/xarray/core/indexes.py"", line 235, in query
label_value, method=method, tolerance=tolerance
TypeError: get_loc() got an unexpected keyword argument 'tolerance'
```
**Anything else we need to know?**: Nothing I'm aware of
**Environment**:
Output of xr.show_versions() (with xarray 0.19.0)
INSTALLED VERSIONS
------------------
commit: None
python: 3.7.12 | packaged by conda-forge | (default, Oct 26 2021, 05:57:50)
[Clang 11.1.0 ]
python-bits: 64
OS: Darwin
OS-release: 20.6.0
machine: x86_64
processor: i386
byteorder: little
LC_ALL: None
LANG: en_US.UTF-8
LOCALE: (None, 'UTF-8')
libhdf5: 1.12.1
libnetcdf: 4.8.1
xarray: 0.19.0
pandas: 1.3.4
numpy: 1.21.4
scipy: 1.7.1
netCDF4: 1.5.8
pydap: None
h5netcdf: None
h5py: None
Nio: None
zarr: None
cftime: 1.5.1.1
nc_time_axis: None
PseudoNetCDF: None
rasterio: None
cfgrib: None
iris: None
bottleneck: None
dask: 2021.11.0
distributed: 2021.11.0
matplotlib: 3.4.3
cartopy: None
seaborn: 0.11.2
numbagg: None
pint: 0.18
setuptools: 58.5.3
pip: 21.3.1
conda: None
pytest: 6.2.5
IPython: 7.29.0
sphinx: 4.2.0
Output of xr.show_versions() (with xarray 0.20.1)
INSTALLED VERSIONS
------------------
commit: None
python: 3.7.12 | packaged by conda-forge | (default, Oct 26 2021, 05:57:50)
[Clang 11.1.0 ]
python-bits: 64
OS: Darwin
OS-release: 20.6.0
machine: x86_64
processor: i386
byteorder: little
LC_ALL: None
LANG: en_US.UTF-8
LOCALE: (None, 'UTF-8')
libhdf5: 1.12.1
libnetcdf: 4.8.1
xarray: 0.20.1
pandas: 1.3.4
numpy: 1.21.4
scipy: 1.7.1
netCDF4: 1.5.8
pydap: None
h5netcdf: None
h5py: None
Nio: None
zarr: None
cftime: 1.5.1.1
nc_time_axis: None
PseudoNetCDF: None
rasterio: None
cfgrib: None
iris: None
bottleneck: None
dask: 2021.11.0
distributed: 2021.11.0
matplotlib: 3.4.3
cartopy: None
seaborn: 0.11.2
numbagg: None
fsspec: 2021.11.0
cupy: None
pint: 0.18
sparse: None
setuptools: 58.5.3
pip: 21.3.1
conda: None
pytest: 6.2.5
IPython: 7.29.0
sphinx: 4.2.0
","{""url"": ""https://api.github.com/repos/pydata/xarray/issues/5953/reactions"", ""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,completed,13221727,issue
|