id,node_id,number,title,user,state,locked,assignee,milestone,comments,created_at,updated_at,closed_at,author_association,active_lock_reason,draft,pull_request,body,reactions,performed_via_github_app,state_reason,repo,type
1390228572,I_kwDOAMm_X85S3TRc,7104,Duplicate values on unstack,114576287,closed,0,,,4,2022-09-29T04:16:26Z,2024-02-13T09:48:37Z,2024-02-13T09:48:37Z,NONE,,,,"### What happened?
I unstacked a dataset and got values I didn't expect. It turns out that, when unstacking, my dataset had multiple values for the same index. This is clearly a case of user error, but it silently passed.
### What did you expect to happen?
A warning or error would be raised to say, ""this isn't going to work"".
### Minimal Complete Verifiable Example
```Python
import datetime as dt
import xarray as xr
ds = xr.DataArray(
[[1, 2, 3], [4, 5, 6]],
dims=(""lat"", ""time""),
coords={""lat"": [-60, 60], ""time"": [dt.datetime(2010, 1, d) for d in range(1, 4)]},
name=""test"",
).to_dataset()
ds = (
ds.assign_coords(
{
""month"": ds[""time""].dt.month,
""year"": ds[""time""].dt.year,
}
)
.set_index(time=[""month"", ""year""])
)
ds = ds.unstack(""time"")
# the output only has 2 values, which isn't what I expected
ds[""test""].data
```
### MVCE confirmation
- [X] Minimal example — the example is as focused as reasonably possible to demonstrate the underlying issue in xarray.
- [X] Complete example — the example is self-contained, including all data and the text of any traceback.
- [X] Verifiable example — the example copy & pastes into an IPython prompt or [Binder notebook](https://mybinder.org/v2/gh/pydata/xarray/main?urlpath=lab/tree/doc/examples/blank_template.ipynb), returning the result.
- [X] New issue — a search of GitHub Issues suggests this is not a duplicate.
### Relevant log output
_No response_
### Anything else we need to know?
It's not clear to me where the error is. It might just be that this particular order of operations leads to a case that isn't otherwise caught. Looking at intermediate output, I thought the error was in unstack but maybe it's more complex than that...
### Environment
INSTALLED VERSIONS
------------------
commit: e678a1d7884a3c24dba22d41b2eef5d7fe5258e7
python: 3.8.13 | packaged by conda-forge | (default, Mar 25 2022, 06:04:14)
[Clang 12.0.1 ]
python-bits: 64
OS: Darwin
OS-release: 21.5.0
machine: arm64
processor: arm
byteorder: little
LC_ALL: None
LANG: en_AU.UTF-8
LOCALE: ('en_AU', 'UTF-8')
libhdf5: 1.12.2
libnetcdf: 4.8.1
xarray: 0.1.dev4312+ge678a1d.d20220928
pandas: 1.5.0
numpy: 1.22.4
scipy: 1.9.1
netCDF4: 1.6.1
pydap: installed
h5netcdf: 1.0.2
h5py: 3.7.0
Nio: None
zarr: 2.13.2
cftime: 1.6.2
nc_time_axis: 1.4.1
PseudoNetCDF: 3.2.2
rasterio: 1.3.1
cfgrib: 0.9.10.1
iris: 3.3.0
bottleneck: 1.3.5
dask: 2022.9.1
distributed: 2022.9.1
matplotlib: 3.6.0
cartopy: 0.21.0
seaborn: 0.12.0
numbagg: 0.2.1
fsspec: 2022.8.2
cupy: None
pint: 0.19.2
sparse: 0.13.0
flox: 0.5.9
numpy_groupies: 0.9.19
setuptools: 65.4.0
pip: 22.2.2
conda: None
pytest: 7.1.3
IPython: 8.5.0
sphinx: None
","{""url"": ""https://api.github.com/repos/pydata/xarray/issues/7104/reactions"", ""total_count"": 1, ""+1"": 1, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,completed,13221727,issue
1389148779,I_kwDOAMm_X85SzLpr,7097,Broken state when using assign_coords with multiindex,114576287,closed,0,4160723,,2,2022-09-28T10:51:34Z,2022-09-29T00:27:38Z,2022-09-28T18:02:17Z,NONE,,,,"### What happened?
I was trying to assign coordinates on a dataset that had been created by using stack. After assigning the coordinates, the dataset was in a state where its length was coming out as less than zero, which caused all sorts of issues.
### What did you expect to happen?
I think the issue is with the updating of `_coord_names`, perhaps in https://github.com/pydata/xarray/blob/18454c218002e48e1643ce8e25654262e5f592ad/xarray/core/coordinates.py#L389.
I expected to just be able to assign the coords and then print the array to see the result.
### Minimal Complete Verifiable Example
```Python
import xarray as xr
ds = xr.DataArray(
[[[1, 1], [0, 0]], [[2, 2], [1, 1]]],
dims=(""lat"", ""year"", ""month""),
coords={""lat"": [-60, 60], ""year"": [2010, 2020], ""month"": [3, 6]},
name=""test"",
).to_dataset()
stacked = ds.stack(time=(""year"", ""month""))
stacked = stacked.assign_coords(
{""time"": [y + m / 12 for y, m in stacked[""time""].values]}
)
# Both these fail with ValueError: __len__() should return >= 0
len(stacked)
print(stacked)
```
### MVCE confirmation
- [X] Minimal example — the example is as focused as reasonably possible to demonstrate the underlying issue in xarray.
- [X] Complete example — the example is self-contained, including all data and the text of any traceback.
- [x] Verifiable example — the example copy & pastes into an IPython prompt or [Binder notebook](https://mybinder.org/v2/gh/pydata/xarray/main?urlpath=lab/tree/doc/examples/blank_template.ipynb), returning the result.
- [X] New issue — a search of GitHub Issues suggests this is not a duplicate.
### Relevant log output
```Python
Traceback (most recent call last):
File ""mre.py"", line 17, in
len(stacked)
File "".../xarray-tests/xarray/core/dataset.py"", line 1364, in __len__
return len(self.data_vars)
ValueError: __len__() should return >= 0
```
### Anything else we need to know?
Here's a test (I put it in `test_dataarray.py` but maybe there is a better spot)
```python
def test_assign_coords_drop_coord_names(self) -> None:
ds = DataArray(
[[[1, 1], [0, 0]], [[2, 2], [1, 1]]],
dims=(""lat"", ""year"", ""month""),
coords={""lat"": [-60, 60], ""year"": [2010, 2020], ""month"": [3, 6]},
name=""test"",
).to_dataset()
stacked = ds.stack(time=(""year"", ""month""))
stacked = stacked.assign_coords(
{""time"": [y + m / 12 for y, m in stacked[""time""].values]}
)
# this seems to be handled correctly
assert set(stacked._variables.keys()) == {""test"", ""time"", ""lat""}
# however, _coord_names doesn't seem to update as expected
# the below fails
assert set(stacked._coord_names) == {""time"", ""lat""}
# the incorrect value of _coord_names means that all the below fails too
# The failure is because the length of a dataset is calculated as (via len(data_vars))
# len(dataset._variables) - len(dataset._coord_names). For the situation
# above, where len(dataset._coord_names) is greater than len(dataset._variables),
# you get a length less than zero which then fails because length must return
# a value greater than zero
# Both these fail with ValueError: __len__() should return >= 0
len(stacked)
print(stacked)
```
### Environment
INSTALLED VERSIONS
------------------
commit: e678a1d7884a3c24dba22d41b2eef5d7fe5258e7
python: 3.8.13 | packaged by conda-forge | (default, Mar 25 2022, 06:04:14)
[Clang 12.0.1 ]
python-bits: 64
OS: Darwin
OS-release: 21.5.0
machine: arm64
processor: arm
byteorder: little
LC_ALL: None
LANG: en_AU.UTF-8
LOCALE: ('en_AU', 'UTF-8')
libhdf5: 1.12.2
libnetcdf: 4.8.1
xarray: 0.1.dev4312+ge678a1d.d20220928
pandas: 1.5.0
numpy: 1.22.4
scipy: 1.9.1
netCDF4: 1.6.1
pydap: installed
h5netcdf: 1.0.2
h5py: 3.7.0
Nio: None
zarr: 2.13.2
cftime: 1.6.2
nc_time_axis: 1.4.1
PseudoNetCDF: 3.2.2
rasterio: 1.3.1
cfgrib: 0.9.10.1
iris: 3.3.0
bottleneck: 1.3.5
dask: 2022.9.1
distributed: 2022.9.1
matplotlib: 3.6.0
cartopy: 0.21.0
seaborn: 0.12.0
numbagg: 0.2.1
fsspec: 2022.8.2
cupy: None
pint: 0.19.2
sparse: 0.13.0
flox: 0.5.9
numpy_groupies: 0.9.19
setuptools: 65.4.0
pip: 22.2.2
conda: None
pytest: 7.1.3
IPython: None
sphinx: None
","{""url"": ""https://api.github.com/repos/pydata/xarray/issues/7097/reactions"", ""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,completed,13221727,issue