home / github

Menu
  • GraphQL API
  • Search all tables

issues

Table actions
  • GraphQL API for issues

1 row where type = "issue" and user = 6654709 sorted by updated_at descending

✎ View and edit SQL

This data as json, CSV (advanced)

Suggested facets: created_at (date), updated_at (date)

type 1

  • issue · 1 ✖

state 1

  • open 1

repo 1

  • xarray 1
id node_id number title user state locked assignee milestone comments created_at updated_at ▲ closed_at author_association active_lock_reason draft pull_request body reactions performed_via_github_app state_reason repo type
2019789753 I_kwDOAMm_X854Y4u5 8499 'drop_duplicates' behaves differently when using 1 vs many coordinates for an index jbweston 6654709 open 0     4 2023-12-01T00:36:42Z 2023-12-01T09:55:39Z   NONE      

What happened?

I am trying to drop_duplicates from a DataArray based on the values of some of the coordinates, starting from a DataArray with coordinates, but no indexes.

To accomplish this, I call 'DataArray.set_xindex' with the appropriate coordinate names, and then call 'drop_duplicates' on the resulting DataArray, like so:   ```python from xarray import DataArray import numpy as np

test_array = DataArray( np.random.rand(5), coords=dict(x=("sample", [1, 2, 1, 2, 1]), y=("sample", [-1] * 5)), dims="sample", )

output DataArray's 'sample' dimension has length 2, as expected

good = test_array.set_xindex(["x", "y"]).drop_duplicates("sample") assert len(good) == 2 ```

The above functions as expected; 'good' has had its duplicates dropped, and we are left with a DataArray of length 2.

However, the following does not function as I would expect:

```python

All the 'y's are '-1', so we expect the same duplicates as before to be dropped,

even if we don't include the 'y' values in the index.

bad = test_array.set_xindex("x").drop_duplicates("sample")

But this assert fails! 'drop_duplicates' does not drop anything

assert not bad.equals(test_array) ```

What did you expect to happen?

I expected drop_duplicates to drop the duplicates when I was using only a single coordinate for the index.

Minimal Complete Verifiable Example

```Python from xarray import DataArray import numpy as np

test_array = DataArray( range(5), coords=dict(x=("sample", [1, 2, 1, 2, 1]), y=("sample", [-1] * 5)), dims="sample", )

output DataArray's 'sample' dimension has length 2, as expected

good = test_array.set_xindex(["x", "y"]).drop_duplicates("sample")

And indeed there are only 2 elements left after dropping duplicates.

assert len(good) == 2

All the 'y's are '-1', so we expect the same duplicates as before to be dropped,

bad = test_array.drop_vars("y").set_xindex("x").drop_duplicates("sample")

But this assert fails! 'drop_duplicates' does not drop anything

assert not bad.equals(test_array.drop_vars("y")) ```

MVCE confirmation

  • [X] Minimal example — the example is as focused as reasonably possible to demonstrate the underlying issue in xarray.
  • [X] Complete example — the example is self-contained, including all data and the text of any traceback.
  • [X] Verifiable example — the example copy & pastes into an IPython prompt or Binder notebook, returning the result.
  • [X] New issue — a search of GitHub Issues suggests this is not a duplicate.
  • [X] Recent environment — the issue occurs with the latest version of xarray and its dependencies.

Relevant log output

No response

Anything else we need to know?

No response

Environment

INSTALLED VERSIONS ------------------ commit: None python: 3.11.5 | packaged by conda-forge | (main, Aug 27 2023, 03:34:09) [GCC 12.3.0] python-bits: 64 OS: Linux OS-release: 5.15.133.1-microsoft-standard-WSL2 machine: x86_64 processor: x86_64 byteorder: little LC_ALL: None LANG: C.UTF-8 LOCALE: ('en_US', 'UTF-8') libhdf5: 1.12.2 libnetcdf: 4.9.1 xarray: 2023.11.0 pandas: 2.1.0 numpy: 1.24.4 scipy: 1.11.2 netCDF4: 1.6.3 pydap: None h5netcdf: 1.2.0 h5py: 3.8.0 Nio: None zarr: None cftime: 1.6.2 nc_time_axis: None iris: None bottleneck: None dask: 2023.9.1 distributed: 2023.9.1 matplotlib: 3.7.2 cartopy: None seaborn: 0.12.2 numbagg: None fsspec: 2023.9.0 cupy: None pint: None sparse: None flox: None numpy_groupies: None setuptools: 68.1.2 pip: 23.2.1 conda: 23.7.3 pytest: 7.4.2 mypy: None IPython: 8.15.0 sphinx: None
{
    "url": "https://api.github.com/repos/pydata/xarray/issues/8499/reactions",
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
    xarray 13221727 issue

Advanced export

JSON shape: default, array, newline-delimited, object

CSV options:

CREATE TABLE [issues] (
   [id] INTEGER PRIMARY KEY,
   [node_id] TEXT,
   [number] INTEGER,
   [title] TEXT,
   [user] INTEGER REFERENCES [users]([id]),
   [state] TEXT,
   [locked] INTEGER,
   [assignee] INTEGER REFERENCES [users]([id]),
   [milestone] INTEGER REFERENCES [milestones]([id]),
   [comments] INTEGER,
   [created_at] TEXT,
   [updated_at] TEXT,
   [closed_at] TEXT,
   [author_association] TEXT,
   [active_lock_reason] TEXT,
   [draft] INTEGER,
   [pull_request] TEXT,
   [body] TEXT,
   [reactions] TEXT,
   [performed_via_github_app] TEXT,
   [state_reason] TEXT,
   [repo] INTEGER REFERENCES [repos]([id]),
   [type] TEXT
);
CREATE INDEX [idx_issues_repo]
    ON [issues] ([repo]);
CREATE INDEX [idx_issues_milestone]
    ON [issues] ([milestone]);
CREATE INDEX [idx_issues_assignee]
    ON [issues] ([assignee]);
CREATE INDEX [idx_issues_user]
    ON [issues] ([user]);
Powered by Datasette · Queries took 18.522ms · About: xarray-datasette