home / github / issues

Menu
  • Search all tables
  • GraphQL API

issues: 1389148779

This data as json

id node_id number title user state locked assignee milestone comments created_at updated_at closed_at author_association active_lock_reason draft pull_request body reactions performed_via_github_app state_reason repo type
1389148779 I_kwDOAMm_X85SzLpr 7097 Broken state when using assign_coords with multiindex 114576287 closed 0 4160723   2 2022-09-28T10:51:34Z 2022-09-29T00:27:38Z 2022-09-28T18:02:17Z NONE      

What happened?

I was trying to assign coordinates on a dataset that had been created by using stack. After assigning the coordinates, the dataset was in a state where its length was coming out as less than zero, which caused all sorts of issues.

What did you expect to happen?

I think the issue is with the updating of _coord_names, perhaps in https://github.com/pydata/xarray/blob/18454c218002e48e1643ce8e25654262e5f592ad/xarray/core/coordinates.py#L389.

I expected to just be able to assign the coords and then print the array to see the result.

Minimal Complete Verifiable Example

```Python import xarray as xr

ds = xr.DataArray( [[[1, 1], [0, 0]], [[2, 2], [1, 1]]], dims=("lat", "year", "month"), coords={"lat": [-60, 60], "year": [2010, 2020], "month": [3, 6]}, name="test", ).to_dataset()

stacked = ds.stack(time=("year", "month")) stacked = stacked.assign_coords( {"time": [y + m / 12 for y, m in stacked["time"].values]} )

Both these fail with ValueError: len() should return >= 0

len(stacked) print(stacked) ```

MVCE confirmation

  • [X] Minimal example — the example is as focused as reasonably possible to demonstrate the underlying issue in xarray.
  • [X] Complete example — the example is self-contained, including all data and the text of any traceback.
  • [x] Verifiable example — the example copy & pastes into an IPython prompt or Binder notebook, returning the result.
  • [X] New issue — a search of GitHub Issues suggests this is not a duplicate.

Relevant log output

Python Traceback (most recent call last): File "mre.py", line 17, in <module> len(stacked) File ".../xarray-tests/xarray/core/dataset.py", line 1364, in __len__ return len(self.data_vars) ValueError: __len__() should return >= 0

Anything else we need to know?

Here's a test (I put it in test_dataarray.py but maybe there is a better spot)

```python def test_assign_coords_drop_coord_names(self) -> None: ds = DataArray( [[[1, 1], [0, 0]], [[2, 2], [1, 1]]], dims=("lat", "year", "month"), coords={"lat": [-60, 60], "year": [2010, 2020], "month": [3, 6]}, name="test", ).to_dataset()

    stacked = ds.stack(time=("year", "month"))
    stacked = stacked.assign_coords(
        {"time": [y + m / 12 for y, m in stacked["time"].values]}
    )

    # this seems to be handled correctly
    assert set(stacked._variables.keys()) == {"test", "time", "lat"}
    # however, _coord_names doesn't seem to update as expected
    # the below fails
    assert set(stacked._coord_names) == {"time", "lat"}

    # the incorrect value of _coord_names means that all the below fails too
    # The failure is because the length of a dataset is calculated as (via len(data_vars))
    # len(dataset._variables) - len(dataset._coord_names). For the situation
    # above, where len(dataset._coord_names) is greater than len(dataset._variables),
    # you get a length less than zero which then fails because length must return
    # a value greater than zero

    # Both these fail with ValueError: __len__() should return >= 0
    len(stacked)
    print(stacked)

```

Environment

INSTALLED VERSIONS ------------------ commit: e678a1d7884a3c24dba22d41b2eef5d7fe5258e7 python: 3.8.13 | packaged by conda-forge | (default, Mar 25 2022, 06:04:14) [Clang 12.0.1 ] python-bits: 64 OS: Darwin OS-release: 21.5.0 machine: arm64 processor: arm byteorder: little LC_ALL: None LANG: en_AU.UTF-8 LOCALE: ('en_AU', 'UTF-8') libhdf5: 1.12.2 libnetcdf: 4.8.1 xarray: 0.1.dev4312+ge678a1d.d20220928 pandas: 1.5.0 numpy: 1.22.4 scipy: 1.9.1 netCDF4: 1.6.1 pydap: installed h5netcdf: 1.0.2 h5py: 3.7.0 Nio: None zarr: 2.13.2 cftime: 1.6.2 nc_time_axis: 1.4.1 PseudoNetCDF: 3.2.2 rasterio: 1.3.1 cfgrib: 0.9.10.1 iris: 3.3.0 bottleneck: 1.3.5 dask: 2022.9.1 distributed: 2022.9.1 matplotlib: 3.6.0 cartopy: 0.21.0 seaborn: 0.12.0 numbagg: 0.2.1 fsspec: 2022.8.2 cupy: None pint: 0.19.2 sparse: 0.13.0 flox: 0.5.9 numpy_groupies: 0.9.19 setuptools: 65.4.0 pip: 22.2.2 conda: None pytest: 7.1.3 IPython: None sphinx: None
{
    "url": "https://api.github.com/repos/pydata/xarray/issues/7097/reactions",
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  completed 13221727 issue

Links from other tables

  • 3 rows from issues_id in issues_labels
  • 2 rows from issue in issue_comments
Powered by Datasette · Queries took 320.326ms · About: xarray-datasette