id,node_id,number,title,user,state,locked,assignee,milestone,comments,created_at,updated_at,closed_at,author_association,active_lock_reason,draft,pull_request,body,reactions,performed_via_github_app,state_reason,repo,type 1390228572,I_kwDOAMm_X85S3TRc,7104,Duplicate values on unstack,114576287,closed,0,,,4,2022-09-29T04:16:26Z,2024-02-13T09:48:37Z,2024-02-13T09:48:37Z,NONE,,,,"### What happened? I unstacked a dataset and got values I didn't expect. It turns out that, when unstacking, my dataset had multiple values for the same index. This is clearly a case of user error, but it silently passed. ### What did you expect to happen? A warning or error would be raised to say, ""this isn't going to work"". ### Minimal Complete Verifiable Example ```Python import datetime as dt import xarray as xr ds = xr.DataArray( [[1, 2, 3], [4, 5, 6]], dims=(""lat"", ""time""), coords={""lat"": [-60, 60], ""time"": [dt.datetime(2010, 1, d) for d in range(1, 4)]}, name=""test"", ).to_dataset() ds = ( ds.assign_coords( { ""month"": ds[""time""].dt.month, ""year"": ds[""time""].dt.year, } ) .set_index(time=[""month"", ""year""]) ) ds = ds.unstack(""time"") # the output only has 2 values, which isn't what I expected ds[""test""].data ``` ### MVCE confirmation - [X] Minimal example — the example is as focused as reasonably possible to demonstrate the underlying issue in xarray. - [X] Complete example — the example is self-contained, including all data and the text of any traceback. - [X] Verifiable example — the example copy & pastes into an IPython prompt or [Binder notebook](https://mybinder.org/v2/gh/pydata/xarray/main?urlpath=lab/tree/doc/examples/blank_template.ipynb), returning the result. - [X] New issue — a search of GitHub Issues suggests this is not a duplicate. ### Relevant log output _No response_ ### Anything else we need to know? It's not clear to me where the error is. It might just be that this particular order of operations leads to a case that isn't otherwise caught. Looking at intermediate output, I thought the error was in unstack but maybe it's more complex than that... ### Environment
INSTALLED VERSIONS ------------------ commit: e678a1d7884a3c24dba22d41b2eef5d7fe5258e7 python: 3.8.13 | packaged by conda-forge | (default, Mar 25 2022, 06:04:14) [Clang 12.0.1 ] python-bits: 64 OS: Darwin OS-release: 21.5.0 machine: arm64 processor: arm byteorder: little LC_ALL: None LANG: en_AU.UTF-8 LOCALE: ('en_AU', 'UTF-8') libhdf5: 1.12.2 libnetcdf: 4.8.1 xarray: 0.1.dev4312+ge678a1d.d20220928 pandas: 1.5.0 numpy: 1.22.4 scipy: 1.9.1 netCDF4: 1.6.1 pydap: installed h5netcdf: 1.0.2 h5py: 3.7.0 Nio: None zarr: 2.13.2 cftime: 1.6.2 nc_time_axis: 1.4.1 PseudoNetCDF: 3.2.2 rasterio: 1.3.1 cfgrib: 0.9.10.1 iris: 3.3.0 bottleneck: 1.3.5 dask: 2022.9.1 distributed: 2022.9.1 matplotlib: 3.6.0 cartopy: 0.21.0 seaborn: 0.12.0 numbagg: 0.2.1 fsspec: 2022.8.2 cupy: None pint: 0.19.2 sparse: 0.13.0 flox: 0.5.9 numpy_groupies: 0.9.19 setuptools: 65.4.0 pip: 22.2.2 conda: None pytest: 7.1.3 IPython: 8.5.0 sphinx: None
","{""url"": ""https://api.github.com/repos/pydata/xarray/issues/7104/reactions"", ""total_count"": 1, ""+1"": 1, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,completed,13221727,issue 1389148779,I_kwDOAMm_X85SzLpr,7097,Broken state when using assign_coords with multiindex,114576287,closed,0,4160723,,2,2022-09-28T10:51:34Z,2022-09-29T00:27:38Z,2022-09-28T18:02:17Z,NONE,,,,"### What happened? I was trying to assign coordinates on a dataset that had been created by using stack. After assigning the coordinates, the dataset was in a state where its length was coming out as less than zero, which caused all sorts of issues. ### What did you expect to happen? I think the issue is with the updating of `_coord_names`, perhaps in https://github.com/pydata/xarray/blob/18454c218002e48e1643ce8e25654262e5f592ad/xarray/core/coordinates.py#L389. I expected to just be able to assign the coords and then print the array to see the result. ### Minimal Complete Verifiable Example ```Python import xarray as xr ds = xr.DataArray( [[[1, 1], [0, 0]], [[2, 2], [1, 1]]], dims=(""lat"", ""year"", ""month""), coords={""lat"": [-60, 60], ""year"": [2010, 2020], ""month"": [3, 6]}, name=""test"", ).to_dataset() stacked = ds.stack(time=(""year"", ""month"")) stacked = stacked.assign_coords( {""time"": [y + m / 12 for y, m in stacked[""time""].values]} ) # Both these fail with ValueError: __len__() should return >= 0 len(stacked) print(stacked) ``` ### MVCE confirmation - [X] Minimal example — the example is as focused as reasonably possible to demonstrate the underlying issue in xarray. - [X] Complete example — the example is self-contained, including all data and the text of any traceback. - [x] Verifiable example — the example copy & pastes into an IPython prompt or [Binder notebook](https://mybinder.org/v2/gh/pydata/xarray/main?urlpath=lab/tree/doc/examples/blank_template.ipynb), returning the result. - [X] New issue — a search of GitHub Issues suggests this is not a duplicate. ### Relevant log output ```Python Traceback (most recent call last): File ""mre.py"", line 17, in len(stacked) File "".../xarray-tests/xarray/core/dataset.py"", line 1364, in __len__ return len(self.data_vars) ValueError: __len__() should return >= 0 ``` ### Anything else we need to know? Here's a test (I put it in `test_dataarray.py` but maybe there is a better spot) ```python def test_assign_coords_drop_coord_names(self) -> None: ds = DataArray( [[[1, 1], [0, 0]], [[2, 2], [1, 1]]], dims=(""lat"", ""year"", ""month""), coords={""lat"": [-60, 60], ""year"": [2010, 2020], ""month"": [3, 6]}, name=""test"", ).to_dataset() stacked = ds.stack(time=(""year"", ""month"")) stacked = stacked.assign_coords( {""time"": [y + m / 12 for y, m in stacked[""time""].values]} ) # this seems to be handled correctly assert set(stacked._variables.keys()) == {""test"", ""time"", ""lat""} # however, _coord_names doesn't seem to update as expected # the below fails assert set(stacked._coord_names) == {""time"", ""lat""} # the incorrect value of _coord_names means that all the below fails too # The failure is because the length of a dataset is calculated as (via len(data_vars)) # len(dataset._variables) - len(dataset._coord_names). For the situation # above, where len(dataset._coord_names) is greater than len(dataset._variables), # you get a length less than zero which then fails because length must return # a value greater than zero # Both these fail with ValueError: __len__() should return >= 0 len(stacked) print(stacked) ``` ### Environment
INSTALLED VERSIONS ------------------ commit: e678a1d7884a3c24dba22d41b2eef5d7fe5258e7 python: 3.8.13 | packaged by conda-forge | (default, Mar 25 2022, 06:04:14) [Clang 12.0.1 ] python-bits: 64 OS: Darwin OS-release: 21.5.0 machine: arm64 processor: arm byteorder: little LC_ALL: None LANG: en_AU.UTF-8 LOCALE: ('en_AU', 'UTF-8') libhdf5: 1.12.2 libnetcdf: 4.8.1 xarray: 0.1.dev4312+ge678a1d.d20220928 pandas: 1.5.0 numpy: 1.22.4 scipy: 1.9.1 netCDF4: 1.6.1 pydap: installed h5netcdf: 1.0.2 h5py: 3.7.0 Nio: None zarr: 2.13.2 cftime: 1.6.2 nc_time_axis: 1.4.1 PseudoNetCDF: 3.2.2 rasterio: 1.3.1 cfgrib: 0.9.10.1 iris: 3.3.0 bottleneck: 1.3.5 dask: 2022.9.1 distributed: 2022.9.1 matplotlib: 3.6.0 cartopy: 0.21.0 seaborn: 0.12.0 numbagg: 0.2.1 fsspec: 2022.8.2 cupy: None pint: 0.19.2 sparse: 0.13.0 flox: 0.5.9 numpy_groupies: 0.9.19 setuptools: 65.4.0 pip: 22.2.2 conda: None pytest: 7.1.3 IPython: None sphinx: None
","{""url"": ""https://api.github.com/repos/pydata/xarray/issues/7097/reactions"", ""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,completed,13221727,issue