home / github

Menu
  • Search all tables
  • GraphQL API

issues

Table actions
  • GraphQL API for issues

2 rows where user = 114576287 sorted by updated_at descending

✎ View and edit SQL

This data as json, CSV (advanced)

Suggested facets: created_at (date), updated_at (date), closed_at (date)

type 1

  • issue 2

state 1

  • closed 2

repo 1

  • xarray 2
id node_id number title user state locked assignee milestone comments created_at updated_at ▲ closed_at author_association active_lock_reason draft pull_request body reactions performed_via_github_app state_reason repo type
1390228572 I_kwDOAMm_X85S3TRc 7104 Duplicate values on unstack znichollscr 114576287 closed 0     4 2022-09-29T04:16:26Z 2024-02-13T09:48:37Z 2024-02-13T09:48:37Z NONE      

What happened?

I unstacked a dataset and got values I didn't expect. It turns out that, when unstacking, my dataset had multiple values for the same index. This is clearly a case of user error, but it silently passed.

What did you expect to happen?

A warning or error would be raised to say, "this isn't going to work".

Minimal Complete Verifiable Example

```Python import datetime as dt import xarray as xr

ds = xr.DataArray( [[1, 2, 3], [4, 5, 6]], dims=("lat", "time"), coords={"lat": [-60, 60], "time": [dt.datetime(2010, 1, d) for d in range(1, 4)]}, name="test", ).to_dataset()

ds = ( ds.assign_coords( { "month": ds["time"].dt.month, "year": ds["time"].dt.year, } ) .set_index(time=["month", "year"]) ) ds = ds.unstack("time")

the output only has 2 values, which isn't what I expected

ds["test"].data ```

MVCE confirmation

  • [X] Minimal example — the example is as focused as reasonably possible to demonstrate the underlying issue in xarray.
  • [X] Complete example — the example is self-contained, including all data and the text of any traceback.
  • [X] Verifiable example — the example copy & pastes into an IPython prompt or Binder notebook, returning the result.
  • [X] New issue — a search of GitHub Issues suggests this is not a duplicate.

Relevant log output

No response

Anything else we need to know?

It's not clear to me where the error is. It might just be that this particular order of operations leads to a case that isn't otherwise caught. Looking at intermediate output, I thought the error was in unstack but maybe it's more complex than that...

Environment

INSTALLED VERSIONS ------------------ commit: e678a1d7884a3c24dba22d41b2eef5d7fe5258e7 python: 3.8.13 | packaged by conda-forge | (default, Mar 25 2022, 06:04:14) [Clang 12.0.1 ] python-bits: 64 OS: Darwin OS-release: 21.5.0 machine: arm64 processor: arm byteorder: little LC_ALL: None LANG: en_AU.UTF-8 LOCALE: ('en_AU', 'UTF-8') libhdf5: 1.12.2 libnetcdf: 4.8.1 xarray: 0.1.dev4312+ge678a1d.d20220928 pandas: 1.5.0 numpy: 1.22.4 scipy: 1.9.1 netCDF4: 1.6.1 pydap: installed h5netcdf: 1.0.2 h5py: 3.7.0 Nio: None zarr: 2.13.2 cftime: 1.6.2 nc_time_axis: 1.4.1 PseudoNetCDF: 3.2.2 rasterio: 1.3.1 cfgrib: 0.9.10.1 iris: 3.3.0 bottleneck: 1.3.5 dask: 2022.9.1 distributed: 2022.9.1 matplotlib: 3.6.0 cartopy: 0.21.0 seaborn: 0.12.0 numbagg: 0.2.1 fsspec: 2022.8.2 cupy: None pint: 0.19.2 sparse: 0.13.0 flox: 0.5.9 numpy_groupies: 0.9.19 setuptools: 65.4.0 pip: 22.2.2 conda: None pytest: 7.1.3 IPython: 8.5.0 sphinx: None
{
    "url": "https://api.github.com/repos/pydata/xarray/issues/7104/reactions",
    "total_count": 1,
    "+1": 1,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  completed xarray 13221727 issue
1389148779 I_kwDOAMm_X85SzLpr 7097 Broken state when using assign_coords with multiindex znichollscr 114576287 closed 0 benbovy 4160723   2 2022-09-28T10:51:34Z 2022-09-29T00:27:38Z 2022-09-28T18:02:17Z NONE      

What happened?

I was trying to assign coordinates on a dataset that had been created by using stack. After assigning the coordinates, the dataset was in a state where its length was coming out as less than zero, which caused all sorts of issues.

What did you expect to happen?

I think the issue is with the updating of _coord_names, perhaps in https://github.com/pydata/xarray/blob/18454c218002e48e1643ce8e25654262e5f592ad/xarray/core/coordinates.py#L389.

I expected to just be able to assign the coords and then print the array to see the result.

Minimal Complete Verifiable Example

```Python import xarray as xr

ds = xr.DataArray( [[[1, 1], [0, 0]], [[2, 2], [1, 1]]], dims=("lat", "year", "month"), coords={"lat": [-60, 60], "year": [2010, 2020], "month": [3, 6]}, name="test", ).to_dataset()

stacked = ds.stack(time=("year", "month")) stacked = stacked.assign_coords( {"time": [y + m / 12 for y, m in stacked["time"].values]} )

Both these fail with ValueError: len() should return >= 0

len(stacked) print(stacked) ```

MVCE confirmation

  • [X] Minimal example — the example is as focused as reasonably possible to demonstrate the underlying issue in xarray.
  • [X] Complete example — the example is self-contained, including all data and the text of any traceback.
  • [x] Verifiable example — the example copy & pastes into an IPython prompt or Binder notebook, returning the result.
  • [X] New issue — a search of GitHub Issues suggests this is not a duplicate.

Relevant log output

Python Traceback (most recent call last): File "mre.py", line 17, in <module> len(stacked) File ".../xarray-tests/xarray/core/dataset.py", line 1364, in __len__ return len(self.data_vars) ValueError: __len__() should return >= 0

Anything else we need to know?

Here's a test (I put it in test_dataarray.py but maybe there is a better spot)

```python def test_assign_coords_drop_coord_names(self) -> None: ds = DataArray( [[[1, 1], [0, 0]], [[2, 2], [1, 1]]], dims=("lat", "year", "month"), coords={"lat": [-60, 60], "year": [2010, 2020], "month": [3, 6]}, name="test", ).to_dataset()

    stacked = ds.stack(time=("year", "month"))
    stacked = stacked.assign_coords(
        {"time": [y + m / 12 for y, m in stacked["time"].values]}
    )

    # this seems to be handled correctly
    assert set(stacked._variables.keys()) == {"test", "time", "lat"}
    # however, _coord_names doesn't seem to update as expected
    # the below fails
    assert set(stacked._coord_names) == {"time", "lat"}

    # the incorrect value of _coord_names means that all the below fails too
    # The failure is because the length of a dataset is calculated as (via len(data_vars))
    # len(dataset._variables) - len(dataset._coord_names). For the situation
    # above, where len(dataset._coord_names) is greater than len(dataset._variables),
    # you get a length less than zero which then fails because length must return
    # a value greater than zero

    # Both these fail with ValueError: __len__() should return >= 0
    len(stacked)
    print(stacked)

```

Environment

INSTALLED VERSIONS ------------------ commit: e678a1d7884a3c24dba22d41b2eef5d7fe5258e7 python: 3.8.13 | packaged by conda-forge | (default, Mar 25 2022, 06:04:14) [Clang 12.0.1 ] python-bits: 64 OS: Darwin OS-release: 21.5.0 machine: arm64 processor: arm byteorder: little LC_ALL: None LANG: en_AU.UTF-8 LOCALE: ('en_AU', 'UTF-8') libhdf5: 1.12.2 libnetcdf: 4.8.1 xarray: 0.1.dev4312+ge678a1d.d20220928 pandas: 1.5.0 numpy: 1.22.4 scipy: 1.9.1 netCDF4: 1.6.1 pydap: installed h5netcdf: 1.0.2 h5py: 3.7.0 Nio: None zarr: 2.13.2 cftime: 1.6.2 nc_time_axis: 1.4.1 PseudoNetCDF: 3.2.2 rasterio: 1.3.1 cfgrib: 0.9.10.1 iris: 3.3.0 bottleneck: 1.3.5 dask: 2022.9.1 distributed: 2022.9.1 matplotlib: 3.6.0 cartopy: 0.21.0 seaborn: 0.12.0 numbagg: 0.2.1 fsspec: 2022.8.2 cupy: None pint: 0.19.2 sparse: 0.13.0 flox: 0.5.9 numpy_groupies: 0.9.19 setuptools: 65.4.0 pip: 22.2.2 conda: None pytest: 7.1.3 IPython: None sphinx: None
{
    "url": "https://api.github.com/repos/pydata/xarray/issues/7097/reactions",
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  completed xarray 13221727 issue

Advanced export

JSON shape: default, array, newline-delimited, object

CSV options:

CREATE TABLE [issues] (
   [id] INTEGER PRIMARY KEY,
   [node_id] TEXT,
   [number] INTEGER,
   [title] TEXT,
   [user] INTEGER REFERENCES [users]([id]),
   [state] TEXT,
   [locked] INTEGER,
   [assignee] INTEGER REFERENCES [users]([id]),
   [milestone] INTEGER REFERENCES [milestones]([id]),
   [comments] INTEGER,
   [created_at] TEXT,
   [updated_at] TEXT,
   [closed_at] TEXT,
   [author_association] TEXT,
   [active_lock_reason] TEXT,
   [draft] INTEGER,
   [pull_request] TEXT,
   [body] TEXT,
   [reactions] TEXT,
   [performed_via_github_app] TEXT,
   [state_reason] TEXT,
   [repo] INTEGER REFERENCES [repos]([id]),
   [type] TEXT
);
CREATE INDEX [idx_issues_repo]
    ON [issues] ([repo]);
CREATE INDEX [idx_issues_milestone]
    ON [issues] ([milestone]);
CREATE INDEX [idx_issues_assignee]
    ON [issues] ([assignee]);
CREATE INDEX [idx_issues_user]
    ON [issues] ([user]);
Powered by Datasette · Queries took 20.758ms · About: xarray-datasette