home / github

Menu
  • Search all tables
  • GraphQL API

issues

Table actions
  • GraphQL API for issues

2 rows where comments = 2, state = "open" and user = 500246 sorted by updated_at descending

✎ View and edit SQL

This data as json, CSV (advanced)

Suggested facets: created_at (date), updated_at (date)

type 1

  • issue 2

state 1

  • open · 2 ✖

repo 1

  • xarray 2
id node_id number title user state locked assignee milestone comments created_at updated_at ▲ closed_at author_association active_lock_reason draft pull_request body reactions performed_via_github_app state_reason repo type
741806260 MDU6SXNzdWU3NDE4MDYyNjA= 4579 Invisible differences between arrays using IntervalIndex gerritholl 500246 open 0     2 2020-11-12T17:54:55Z 2022-10-03T15:09:25Z   CONTRIBUTOR      

What happened:

I have two DataArrays that each have a coordinate constructed with pandas.interval_range. In one case I pass the interval_range directly, in the other case I call .to_numpy() first. The two DataArrays look identical but aren't. This can lead to hard-to-find bugs, because behaviour is not identical: the former supports indexing whereas the latter doesn't.

What you expected to happen:

I expect two arrays that appear identical to behave identically. If they don't behave identically then there should be some way to tell the difference (apart from equals, which tells me they are different but not how).

Minimal Complete Verifiable Example:

```python import xarray import pandas

da1 = xarray.DataArray([0, 1, 2], dims=("x",), coords={"x": pandas.interval_range(0, 2, 3)}) da2 = xarray.DataArray([0, 1, 2], dims=("x",), coords={"x": pandas.interval_range(0, 2, 3).to_numpy()})

print(repr(da1) == repr(da2)) print(repr(da1.x) == repr(da2.x)) print(da1.x.dtype == da2.x.dtype)

identical? No:

print(da1.equals(da2)) print(da1.x.equals(da2.x))

in particular:

da1.sel(x=1) # works da2.sel(x=1) # fails ```

Results in:

``` True True True False False Traceback (most recent call last): File "/data/gholl/miniconda3/envs/py38/lib/python3.8/site-packages/pandas/core/indexes/base.py", line 2895, in get_loc return self._engine.get_loc(casted_key) File "pandas/_libs/index.pyx", line 70, in pandas._libs.index.IndexEngine.get_loc File "pandas/_libs/index.pyx", line 101, in pandas._libs.index.IndexEngine.get_loc File "pandas/_libs/hashtable_class_helper.pxi", line 1675, in pandas._libs.hashtable.PyObjectHashTable.get_item File "pandas/_libs/hashtable_class_helper.pxi", line 1683, in pandas._libs.hashtable.PyObjectHashTable.get_item KeyError: 1

The above exception was the direct cause of the following exception:

Traceback (most recent call last): File "mwe105.py", line 19, in <module> da2.sel(x=1) # fails File "/data/gholl/miniconda3/envs/py38/lib/python3.8/site-packages/xarray/core/dataarray.py", line 1143, in sel ds = self._to_temp_dataset().sel( File "/data/gholl/miniconda3/envs/py38/lib/python3.8/site-packages/xarray/core/dataset.py", line 2105, in sel pos_indexers, new_indexes = remap_label_indexers( File "/data/gholl/miniconda3/envs/py38/lib/python3.8/site-packages/xarray/core/coordinates.py", line 397, in remap_label_indexers pos_indexers, new_indexes = indexing.remap_label_indexers( File "/data/gholl/miniconda3/envs/py38/lib/python3.8/site-packages/xarray/core/indexing.py", line 275, in remap_label_indexers idxr, new_idx = convert_label_indexer(index, label, dim, method, tolerance) File "/data/gholl/miniconda3/envs/py38/lib/python3.8/site-packages/xarray/core/indexing.py", line 196, in convert_label_indexer indexer = index.get_loc(label_value, method=method, tolerance=tolerance) File "/data/gholl/miniconda3/envs/py38/lib/python3.8/site-packages/pandas/core/indexes/base.py", line 2897, in get_loc raise KeyError(key) from err KeyError: 1 ```

Additional context

I suppose this happens because under the hood xarray does something clever to support pandas-style indexing even though the coordinate variable appears like a numpy array with an object dtype, and that this cleverness is lost if the object is already converted to a numpy array. But there is, as far as I can see, no way to tell the difference once the objects have been created.

Environment:

Output of <tt>xr.show_versions()</tt> INSTALLED VERSIONS ------------------ commit: None python: 3.8.6 | packaged by conda-forge | (default, Oct 7 2020, 19:08:05) [GCC 7.5.0] python-bits: 64 OS: Linux OS-release: 4.12.14-lp150.12.82-default machine: x86_64 processor: x86_64 byteorder: little LC_ALL: None LANG: en_GB.UTF-8 LOCALE: en_GB.UTF-8 libhdf5: 1.10.6 libnetcdf: 4.7.4 xarray: 0.16.1 pandas: 1.1.4 numpy: 1.19.4 scipy: 1.5.3 netCDF4: 1.5.4 pydap: None h5netcdf: 0.8.1 h5py: 3.1.0 Nio: None zarr: 2.5.0 cftime: 1.2.1 nc_time_axis: None PseudoNetCDF: None rasterio: 1.1.7 cfgrib: None iris: None bottleneck: None dask: 2.30.0 distributed: 2.30.1 matplotlib: 3.3.2 cartopy: 0.18.0 seaborn: None numbagg: None pint: None setuptools: 49.6.0.post20201009 pip: 20.2.4 conda: installed pytest: 6.1.2 IPython: 7.19.0 sphinx: 3.3.0
{
    "url": "https://api.github.com/repos/pydata/xarray/issues/4579/reactions",
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
    xarray 13221727 issue
686461572 MDU6SXNzdWU2ODY0NjE1NzI= 4378 Plotting when Interval coordinate is timedelta-based gerritholl 500246 open 0     2 2020-08-26T16:36:27Z 2022-04-18T21:55:15Z   CONTRIBUTOR      

Is your feature request related to a problem? Please describe.

The xarray plotting interface supports coordinates containing pandas.Interval iff those intervals contain numbers. It fails when those intervals contain pandas.Timedelta:

```python import numpy as np import pandas as pd import xarray as xr

da = xr.DataArray( np.arange(10), dims=("x",), coords={"x": [pd.Interval(i, i+1) for i in range(10)]}) da.plot() # works

da = xr.DataArray( np.arange(10), dims=("x",), coords={"x": [pd.Interval( d-pd.Timestamp("2000-01-01"), d-pd.Timestamp("2000-01-01")+pd.Timedelta("1H")) for d in pd.date_range("2000-01-01", "2000-01-02", 10)]}) da.plot() # fails ```

The latter fails with:

Traceback (most recent call last): File "mwe82.py", line 18, in <module> da.plot() # fails File "/data/gholl/miniconda3/envs/py38/lib/python3.8/site-packages/xarray/plot/plot.py", line 446, in __call__ return plot(self._da, **kwargs) File "/data/gholl/miniconda3/envs/py38/lib/python3.8/site-packages/xarray/plot/plot.py", line 200, in plot return plotfunc(darray, **kwargs) File "/data/gholl/miniconda3/envs/py38/lib/python3.8/site-packages/xarray/plot/plot.py", line 302, in line _ensure_plottable(xplt_val, yplt_val) File "/data/gholl/miniconda3/envs/py38/lib/python3.8/site-packages/xarray/plot/utils.py", line 551, in _ensure_plottable raise TypeError( TypeError: Plotting requires coordinates to be numeric or dates of type np.datetime64, datetime.datetime, cftime.datetime or pd.Interval.

This error message is somewhat confusing, because the coordinates are "dates of type (...) pd.Interval", but perhaps a timedelta is not considered a date.

Describe the solution you'd like

I would like that I can use the xarray plotting interface for any pandas.Interval coordinate, including pandas.Timestamp and pandas.Timedelta.

Describe alternatives you've considered

I'll "manually" calculate the midpoints and use those as a timedelta coordinate instead.

Additional context

It seems that regular timedeltas aren't really supported either, although they don't cause an error message, they rather produce incorrect results. There's probably a related issue somewhere, but I can't find it now.

{
    "url": "https://api.github.com/repos/pydata/xarray/issues/4378/reactions",
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
    xarray 13221727 issue

Advanced export

JSON shape: default, array, newline-delimited, object

CSV options:

CREATE TABLE [issues] (
   [id] INTEGER PRIMARY KEY,
   [node_id] TEXT,
   [number] INTEGER,
   [title] TEXT,
   [user] INTEGER REFERENCES [users]([id]),
   [state] TEXT,
   [locked] INTEGER,
   [assignee] INTEGER REFERENCES [users]([id]),
   [milestone] INTEGER REFERENCES [milestones]([id]),
   [comments] INTEGER,
   [created_at] TEXT,
   [updated_at] TEXT,
   [closed_at] TEXT,
   [author_association] TEXT,
   [active_lock_reason] TEXT,
   [draft] INTEGER,
   [pull_request] TEXT,
   [body] TEXT,
   [reactions] TEXT,
   [performed_via_github_app] TEXT,
   [state_reason] TEXT,
   [repo] INTEGER REFERENCES [repos]([id]),
   [type] TEXT
);
CREATE INDEX [idx_issues_repo]
    ON [issues] ([repo]);
CREATE INDEX [idx_issues_milestone]
    ON [issues] ([milestone]);
CREATE INDEX [idx_issues_assignee]
    ON [issues] ([assignee]);
CREATE INDEX [idx_issues_user]
    ON [issues] ([user]);
Powered by Datasette · Queries took 2880.935ms · About: xarray-datasette