home / github / issues

Menu
  • Search all tables
  • GraphQL API

issues: 741806260

This data as json

id node_id number title user state locked assignee milestone comments created_at updated_at closed_at author_association active_lock_reason draft pull_request body reactions performed_via_github_app state_reason repo type
741806260 MDU6SXNzdWU3NDE4MDYyNjA= 4579 Invisible differences between arrays using IntervalIndex 500246 open 0     2 2020-11-12T17:54:55Z 2022-10-03T15:09:25Z   CONTRIBUTOR      

What happened:

I have two DataArrays that each have a coordinate constructed with pandas.interval_range. In one case I pass the interval_range directly, in the other case I call .to_numpy() first. The two DataArrays look identical but aren't. This can lead to hard-to-find bugs, because behaviour is not identical: the former supports indexing whereas the latter doesn't.

What you expected to happen:

I expect two arrays that appear identical to behave identically. If they don't behave identically then there should be some way to tell the difference (apart from equals, which tells me they are different but not how).

Minimal Complete Verifiable Example:

```python import xarray import pandas

da1 = xarray.DataArray([0, 1, 2], dims=("x",), coords={"x": pandas.interval_range(0, 2, 3)}) da2 = xarray.DataArray([0, 1, 2], dims=("x",), coords={"x": pandas.interval_range(0, 2, 3).to_numpy()})

print(repr(da1) == repr(da2)) print(repr(da1.x) == repr(da2.x)) print(da1.x.dtype == da2.x.dtype)

identical? No:

print(da1.equals(da2)) print(da1.x.equals(da2.x))

in particular:

da1.sel(x=1) # works da2.sel(x=1) # fails ```

Results in:

``` True True True False False Traceback (most recent call last): File "/data/gholl/miniconda3/envs/py38/lib/python3.8/site-packages/pandas/core/indexes/base.py", line 2895, in get_loc return self._engine.get_loc(casted_key) File "pandas/_libs/index.pyx", line 70, in pandas._libs.index.IndexEngine.get_loc File "pandas/_libs/index.pyx", line 101, in pandas._libs.index.IndexEngine.get_loc File "pandas/_libs/hashtable_class_helper.pxi", line 1675, in pandas._libs.hashtable.PyObjectHashTable.get_item File "pandas/_libs/hashtable_class_helper.pxi", line 1683, in pandas._libs.hashtable.PyObjectHashTable.get_item KeyError: 1

The above exception was the direct cause of the following exception:

Traceback (most recent call last): File "mwe105.py", line 19, in <module> da2.sel(x=1) # fails File "/data/gholl/miniconda3/envs/py38/lib/python3.8/site-packages/xarray/core/dataarray.py", line 1143, in sel ds = self._to_temp_dataset().sel( File "/data/gholl/miniconda3/envs/py38/lib/python3.8/site-packages/xarray/core/dataset.py", line 2105, in sel pos_indexers, new_indexes = remap_label_indexers( File "/data/gholl/miniconda3/envs/py38/lib/python3.8/site-packages/xarray/core/coordinates.py", line 397, in remap_label_indexers pos_indexers, new_indexes = indexing.remap_label_indexers( File "/data/gholl/miniconda3/envs/py38/lib/python3.8/site-packages/xarray/core/indexing.py", line 275, in remap_label_indexers idxr, new_idx = convert_label_indexer(index, label, dim, method, tolerance) File "/data/gholl/miniconda3/envs/py38/lib/python3.8/site-packages/xarray/core/indexing.py", line 196, in convert_label_indexer indexer = index.get_loc(label_value, method=method, tolerance=tolerance) File "/data/gholl/miniconda3/envs/py38/lib/python3.8/site-packages/pandas/core/indexes/base.py", line 2897, in get_loc raise KeyError(key) from err KeyError: 1 ```

Additional context

I suppose this happens because under the hood xarray does something clever to support pandas-style indexing even though the coordinate variable appears like a numpy array with an object dtype, and that this cleverness is lost if the object is already converted to a numpy array. But there is, as far as I can see, no way to tell the difference once the objects have been created.

Environment:

Output of <tt>xr.show_versions()</tt> INSTALLED VERSIONS ------------------ commit: None python: 3.8.6 | packaged by conda-forge | (default, Oct 7 2020, 19:08:05) [GCC 7.5.0] python-bits: 64 OS: Linux OS-release: 4.12.14-lp150.12.82-default machine: x86_64 processor: x86_64 byteorder: little LC_ALL: None LANG: en_GB.UTF-8 LOCALE: en_GB.UTF-8 libhdf5: 1.10.6 libnetcdf: 4.7.4 xarray: 0.16.1 pandas: 1.1.4 numpy: 1.19.4 scipy: 1.5.3 netCDF4: 1.5.4 pydap: None h5netcdf: 0.8.1 h5py: 3.1.0 Nio: None zarr: 2.5.0 cftime: 1.2.1 nc_time_axis: None PseudoNetCDF: None rasterio: 1.1.7 cfgrib: None iris: None bottleneck: None dask: 2.30.0 distributed: 2.30.1 matplotlib: 3.3.2 cartopy: 0.18.0 seaborn: None numbagg: None pint: None setuptools: 49.6.0.post20201009 pip: 20.2.4 conda: installed pytest: 6.1.2 IPython: 7.19.0 sphinx: 3.3.0
{
    "url": "https://api.github.com/repos/pydata/xarray/issues/4579/reactions",
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
    13221727 issue

Links from other tables

  • 1 row from issues_id in issues_labels
  • 2 rows from issue in issue_comments
Powered by Datasette · Queries took 0.915ms · About: xarray-datasette