home / github

Menu
  • Search all tables
  • GraphQL API

issues

Table actions
  • GraphQL API for issues

9 rows where state = "open" and user = 500246 sorted by updated_at descending

✎ View and edit SQL

This data as json, CSV (advanced)

Suggested facets: comments, created_at (date), updated_at (date)

type 1

  • issue 9

state 1

  • open · 9 ✖

repo 1

  • xarray 9
id node_id number title user state locked assignee milestone comments created_at updated_at ▲ closed_at author_association active_lock_reason draft pull_request body reactions performed_via_github_app state_reason repo type
203999231 MDU6SXNzdWUyMDM5OTkyMzE= 1238 `set_index` converts string-dtype to object-dtype gerritholl 500246 open 0     10 2017-01-30T12:37:05Z 2023-03-13T14:09:21Z   CONTRIBUTOR      

'Dataset.set_index' apparently changes a <U1 dtype into an object-dtype, as illustrated below:

``` In [108]: ds = xarray.Dataset({"x": (("a", "b"), arange(25).reshape(5,5)+100), "y": ("b", arange(5)-100)}, {"a": arange(5), "b": arange(5)*2, "c": (("a",), list("ABCDE"))})

In [109]: print(ds) <xarray.Dataset> Dimensions: (a: 5, b: 5) Coordinates: * b (b) int64 0 2 4 6 8 c (a) <U1 'A' 'B' 'C' 'D' 'E' * a (a) int64 0 1 2 3 4 Data variables: x (a, b) int64 100 101 102 103 104 105 106 107 108 109 110 111 ... y (b) int64 -100 -99 -98 -97 -96

In [110]: print(ds.set_index(a='c')) <xarray.Dataset> Dimensions: (a: 5, b: 5) Coordinates: * b (b) int64 0 2 4 6 8 * a (a) object 'A' 'B' 'C' 'D' 'E' Data variables: x (a, b) int64 100 101 102 103 104 105 106 107 108 109 110 111 ... y (b) int64 -100 -99 -98 -97 -96 ```

{
    "url": "https://api.github.com/repos/pydata/xarray/issues/1238/reactions",
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
    xarray 13221727 issue
741806260 MDU6SXNzdWU3NDE4MDYyNjA= 4579 Invisible differences between arrays using IntervalIndex gerritholl 500246 open 0     2 2020-11-12T17:54:55Z 2022-10-03T15:09:25Z   CONTRIBUTOR      

What happened:

I have two DataArrays that each have a coordinate constructed with pandas.interval_range. In one case I pass the interval_range directly, in the other case I call .to_numpy() first. The two DataArrays look identical but aren't. This can lead to hard-to-find bugs, because behaviour is not identical: the former supports indexing whereas the latter doesn't.

What you expected to happen:

I expect two arrays that appear identical to behave identically. If they don't behave identically then there should be some way to tell the difference (apart from equals, which tells me they are different but not how).

Minimal Complete Verifiable Example:

```python import xarray import pandas

da1 = xarray.DataArray([0, 1, 2], dims=("x",), coords={"x": pandas.interval_range(0, 2, 3)}) da2 = xarray.DataArray([0, 1, 2], dims=("x",), coords={"x": pandas.interval_range(0, 2, 3).to_numpy()})

print(repr(da1) == repr(da2)) print(repr(da1.x) == repr(da2.x)) print(da1.x.dtype == da2.x.dtype)

identical? No:

print(da1.equals(da2)) print(da1.x.equals(da2.x))

in particular:

da1.sel(x=1) # works da2.sel(x=1) # fails ```

Results in:

``` True True True False False Traceback (most recent call last): File "/data/gholl/miniconda3/envs/py38/lib/python3.8/site-packages/pandas/core/indexes/base.py", line 2895, in get_loc return self._engine.get_loc(casted_key) File "pandas/_libs/index.pyx", line 70, in pandas._libs.index.IndexEngine.get_loc File "pandas/_libs/index.pyx", line 101, in pandas._libs.index.IndexEngine.get_loc File "pandas/_libs/hashtable_class_helper.pxi", line 1675, in pandas._libs.hashtable.PyObjectHashTable.get_item File "pandas/_libs/hashtable_class_helper.pxi", line 1683, in pandas._libs.hashtable.PyObjectHashTable.get_item KeyError: 1

The above exception was the direct cause of the following exception:

Traceback (most recent call last): File "mwe105.py", line 19, in <module> da2.sel(x=1) # fails File "/data/gholl/miniconda3/envs/py38/lib/python3.8/site-packages/xarray/core/dataarray.py", line 1143, in sel ds = self._to_temp_dataset().sel( File "/data/gholl/miniconda3/envs/py38/lib/python3.8/site-packages/xarray/core/dataset.py", line 2105, in sel pos_indexers, new_indexes = remap_label_indexers( File "/data/gholl/miniconda3/envs/py38/lib/python3.8/site-packages/xarray/core/coordinates.py", line 397, in remap_label_indexers pos_indexers, new_indexes = indexing.remap_label_indexers( File "/data/gholl/miniconda3/envs/py38/lib/python3.8/site-packages/xarray/core/indexing.py", line 275, in remap_label_indexers idxr, new_idx = convert_label_indexer(index, label, dim, method, tolerance) File "/data/gholl/miniconda3/envs/py38/lib/python3.8/site-packages/xarray/core/indexing.py", line 196, in convert_label_indexer indexer = index.get_loc(label_value, method=method, tolerance=tolerance) File "/data/gholl/miniconda3/envs/py38/lib/python3.8/site-packages/pandas/core/indexes/base.py", line 2897, in get_loc raise KeyError(key) from err KeyError: 1 ```

Additional context

I suppose this happens because under the hood xarray does something clever to support pandas-style indexing even though the coordinate variable appears like a numpy array with an object dtype, and that this cleverness is lost if the object is already converted to a numpy array. But there is, as far as I can see, no way to tell the difference once the objects have been created.

Environment:

Output of <tt>xr.show_versions()</tt> INSTALLED VERSIONS ------------------ commit: None python: 3.8.6 | packaged by conda-forge | (default, Oct 7 2020, 19:08:05) [GCC 7.5.0] python-bits: 64 OS: Linux OS-release: 4.12.14-lp150.12.82-default machine: x86_64 processor: x86_64 byteorder: little LC_ALL: None LANG: en_GB.UTF-8 LOCALE: en_GB.UTF-8 libhdf5: 1.10.6 libnetcdf: 4.7.4 xarray: 0.16.1 pandas: 1.1.4 numpy: 1.19.4 scipy: 1.5.3 netCDF4: 1.5.4 pydap: None h5netcdf: 0.8.1 h5py: 3.1.0 Nio: None zarr: 2.5.0 cftime: 1.2.1 nc_time_axis: None PseudoNetCDF: None rasterio: 1.1.7 cfgrib: None iris: None bottleneck: None dask: 2.30.0 distributed: 2.30.1 matplotlib: 3.3.2 cartopy: 0.18.0 seaborn: None numbagg: None pint: None setuptools: 49.6.0.post20201009 pip: 20.2.4 conda: installed pytest: 6.1.2 IPython: 7.19.0 sphinx: 3.3.0
{
    "url": "https://api.github.com/repos/pydata/xarray/issues/4579/reactions",
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
    xarray 13221727 issue
232623945 MDU6SXNzdWUyMzI2MjM5NDU= 1435 xarray.plot.imshow with datetime coordinates results in blank plot gerritholl 500246 open 0     6 2017-05-31T16:31:30Z 2022-05-03T01:56:37Z   CONTRIBUTOR      

``` In [72]: da = xarray.DataArray(arange(5*6).reshape(5,6), dims=("A", "B"), coords={"A": arange(5), "B": pd.date_range("2000-01-01", periods=6)})

In [73]: da.plot.imshow() Out[73]: <matplotlib.image.AxesImage at 0x7f699cf1acf8> ```

The resulting plot has the correct axes and colorbar, but the contents of the plot itself are blank. Upon moving the cursor over the plot, there is an exception in Tkinter:

Exception in Tkinter callback Traceback (most recent call last): File "/home/users/gholl/lib/python3.5/tkinter/__init__.py", line 1549, in __call__ return self.func(*args) File "/dev/shm/gerrit/venv/stable-3.5/lib/python3.5/site-packages/matplotlib/backends/backend_tkagg.py", line 387, in motion_notify_event FigureCanvasBase.motion_notify_event(self, x, y, guiEvent=event) File "/dev/shm/gerrit/venv/stable-3.5/lib/python3.5/site-packages/matplotlib/backend_bases.py", line 1966, in motion_notify_event self.callbacks.process(s, event) File "/dev/shm/gerrit/venv/stable-3.5/lib/python3.5/site-packages/matplotlib/cbook.py", line 554, in process proxy(*args, **kwargs) File "/dev/shm/gerrit/venv/stable-3.5/lib/python3.5/site-packages/matplotlib/cbook.py", line 416, in __call__ return mtd(*args, **kwargs) File "/dev/shm/gerrit/venv/stable-3.5/lib/python3.5/site-packages/matplotlib/backend_bases.py", line 2857, in mouse_move artists = [a for a in event.inaxes.mouseover_set File "/dev/shm/gerrit/venv/stable-3.5/lib/python3.5/site-packages/matplotlib/backend_bases.py", line 2858, in <listcomp> if a.contains(event) and a.get_visible()] File "/dev/shm/gerrit/venv/stable-3.5/lib/python3.5/site-packages/matplotlib/image.py", line 567, in contains inside = ((x >= xmin) and (x <= xmax) and TypeError: invalid type promotion

{
    "url": "https://api.github.com/repos/pydata/xarray/issues/1435/reactions",
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
    xarray 13221727 issue
686461572 MDU6SXNzdWU2ODY0NjE1NzI= 4378 Plotting when Interval coordinate is timedelta-based gerritholl 500246 open 0     2 2020-08-26T16:36:27Z 2022-04-18T21:55:15Z   CONTRIBUTOR      

Is your feature request related to a problem? Please describe.

The xarray plotting interface supports coordinates containing pandas.Interval iff those intervals contain numbers. It fails when those intervals contain pandas.Timedelta:

```python import numpy as np import pandas as pd import xarray as xr

da = xr.DataArray( np.arange(10), dims=("x",), coords={"x": [pd.Interval(i, i+1) for i in range(10)]}) da.plot() # works

da = xr.DataArray( np.arange(10), dims=("x",), coords={"x": [pd.Interval( d-pd.Timestamp("2000-01-01"), d-pd.Timestamp("2000-01-01")+pd.Timedelta("1H")) for d in pd.date_range("2000-01-01", "2000-01-02", 10)]}) da.plot() # fails ```

The latter fails with:

Traceback (most recent call last): File "mwe82.py", line 18, in <module> da.plot() # fails File "/data/gholl/miniconda3/envs/py38/lib/python3.8/site-packages/xarray/plot/plot.py", line 446, in __call__ return plot(self._da, **kwargs) File "/data/gholl/miniconda3/envs/py38/lib/python3.8/site-packages/xarray/plot/plot.py", line 200, in plot return plotfunc(darray, **kwargs) File "/data/gholl/miniconda3/envs/py38/lib/python3.8/site-packages/xarray/plot/plot.py", line 302, in line _ensure_plottable(xplt_val, yplt_val) File "/data/gholl/miniconda3/envs/py38/lib/python3.8/site-packages/xarray/plot/utils.py", line 551, in _ensure_plottable raise TypeError( TypeError: Plotting requires coordinates to be numeric or dates of type np.datetime64, datetime.datetime, cftime.datetime or pd.Interval.

This error message is somewhat confusing, because the coordinates are "dates of type (...) pd.Interval", but perhaps a timedelta is not considered a date.

Describe the solution you'd like

I would like that I can use the xarray plotting interface for any pandas.Interval coordinate, including pandas.Timestamp and pandas.Timedelta.

Describe alternatives you've considered

I'll "manually" calculate the midpoints and use those as a timedelta coordinate instead.

Additional context

It seems that regular timedeltas aren't really supported either, although they don't cause an error message, they rather produce incorrect results. There's probably a related issue somewhere, but I can't find it now.

{
    "url": "https://api.github.com/repos/pydata/xarray/issues/4378/reactions",
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
    xarray 13221727 issue
203630267 MDU6SXNzdWUyMDM2MzAyNjc= 1234 `where` grows new dimensions for unrelated variables gerritholl 500246 open 0     5 2017-01-27T13:02:34Z 2022-04-18T16:04:16Z   CONTRIBUTOR      

In the example below, the dimensionality for data variable y grows from (b) to (b, a) after calling the dataset where method. This behaviour does not appear to be documented. Is it a bug?

``` In [46]: ds = xarray.Dataset({"x": (("a", "b"), arange(25).reshape(5,5)+100), "y": ("b", arange(5)-100)}, {"a": arange(5), "b": arange(5)*2, "c": (("a",), list("ABCDE"))})

In [47]: print(ds)
<xarray.Dataset>
Dimensions:  (a: 5, b: 5)
Coordinates:
  * b        (b) int64 0 2 4 6 8
    c        (a) <U1 'A' 'B' 'C' 'D' 'E'
  * a        (a) int64 0 1 2 3 4
Data variables:
    x        (a, b) int64 100 101 102 103 104 105 106 107 108 109 110 111 ...
    y        (b) int64 -100 -99 -98 -97 -96

In [69]: ds.where((ds.c>='A') & (ds.c<='C'))
Out[69]: 
<xarray.Dataset>
Dimensions:  (a: 5, b: 5)
Coordinates:
  * b        (b) int64 0 2 4 6 8
    c        (a) <U1 'A' 'B' 'C' 'D' 'E'
  * a        (a) int64 0 1 2 3 4
Data variables:
    x        (a, b) float64 100.0 101.0 102.0 103.0 104.0 105.0 106.0 107.0 ...
    y        (b, a) float64 -100.0 -100.0 -100.0 nan nan -99.0 -99.0 -99.0 ...

```

{
    "url": "https://api.github.com/repos/pydata/xarray/issues/1234/reactions",
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
    xarray 13221727 issue
751732952 MDU6SXNzdWU3NTE3MzI5NTI= 4612 Assigning nan to int-dtype array converts nan to int gerritholl 500246 open 0     1 2020-11-26T17:00:45Z 2021-01-02T03:55:30Z   CONTRIBUTOR      

(I am almost sure this already exists as an issue, but I can't find the original)

What happened:

When assigning nan to a integer-dtype array, the nan gets incorrectly inverted to int.

What you expected to happen:

I expect to get a ValueError, like I get for pure numpy arrays.

Minimal Complete Verifiable Example:

python import xarray import numpy da = xarray.DataArray(numpy.array([0,1], dtype="u8")) da[0] = numpy.nan print(da)

Gives:

<xarray.DataArray (dim_0: 2)> array([9223372036854775808, 1], dtype=uint64) Dimensions without coordinates: dim_0

Anything else we need to know?:

In numpy the equivalent code raises ValueError

This is related but different from #2945. In #2945, xarray behaves the same as numpy. In #4612, xarray behaves differently from numpy.

Environment:

Output of <tt>xr.show_versions()</tt> INSTALLED VERSIONS ------------------ commit: None python: 3.8.6 | packaged by conda-forge | (default, Oct 7 2020, 19:08:05) [GCC 7.5.0] python-bits: 64 OS: Linux OS-release: 4.12.14-lp150.12.82-default machine: x86_64 processor: x86_64 byteorder: little LC_ALL: None LANG: en_GB.UTF-8 LOCALE: en_GB.UTF-8 libhdf5: 1.10.6 libnetcdf: 4.7.4 xarray: 0.16.1 pandas: 1.1.4 numpy: 1.19.4 scipy: 1.5.3 netCDF4: 1.5.4 pydap: None h5netcdf: 0.8.1 h5py: 3.1.0 Nio: None zarr: 2.5.0 cftime: 1.2.1 nc_time_axis: None PseudoNetCDF: None rasterio: 1.1.7 cfgrib: None iris: None bottleneck: None dask: 2.30.0 distributed: 2.30.1 matplotlib: 3.3.2 cartopy: 0.18.0 seaborn: None numbagg: None pint: None setuptools: 49.6.0.post20201009 pip: 20.2.4 conda: installed pytest: 6.1.2 IPython: 7.19.0 sphinx: 3.3.0
{
    "url": "https://api.github.com/repos/pydata/xarray/issues/4612/reactions",
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
    xarray 13221727 issue
283345586 MDU6SXNzdWUyODMzNDU1ODY= 1792 Comparison with masked array yields object-array with nans for masked values gerritholl 500246 open 0     3 2017-12-19T19:37:13Z 2020-10-11T13:34:25Z   CONTRIBUTOR      

Code Sample, a copy-pastable example if possible

``` $ cat mwe.py

!/usr/bin/env python3.6

import xarray import numpy

da = xarray.DataArray(numpy.arange(5)) ma = numpy.ma.masked_array(numpy.arange(5), [True, False, False, False, True]) print(da>ma) $ ./mwe.py <xarray.DataArray (dim_0: 5)> array([nan, False, False, False, nan], dtype=object) Dimensions without coordinates: dim_0 ```

Problem description

A comparison between a DataArray and a masked_array results in an array with dtype object instead of an array with dtype bool. This is problematic, because code should be able to assume that x > y returns something with a bool dtype.

Expected Output

I would expect the masked array to be dropped (which it is) and an array to be returned equivalent to the comparison da>ma.data

<xarray.DataArray (dim_0: 5)> array([False, False, False, False, False], dtype=bool) Dimensions without coordinates: dim_0

Output of xr.show_versions()

INSTALLED VERSIONS ------------------ commit: None python: 3.6.1.final.0 python-bits: 64 OS: Linux OS-release: 2.6.32-696.6.3.el6.x86_64 machine: x86_64 processor: x86_64 byteorder: little LC_ALL: None LANG: en_GB.UTF-8 LOCALE: en_GB.UTF-8 xarray: 0.10.0+dev12.gf882a58 pandas: 0.21.0 numpy: 1.13.3 scipy: 1.0.0 netCDF4: 1.3.1 h5netcdf: None Nio: None bottleneck: 1.2.1 cyordereddict: None dask: 0.16.0 matplotlib: 2.1.0 cartopy: None seaborn: 0.8.1 setuptools: 38.2.4 pip: 9.0.1 conda: 4.3.16 pytest: 3.1.2 IPython: 6.1.0 sphinx: 1.6.2 None
{
    "url": "https://api.github.com/repos/pydata/xarray/issues/1792/reactions",
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
    xarray 13221727 issue
199188476 MDU6SXNzdWUxOTkxODg0NzY= 1194 Use masked arrays while preserving int gerritholl 500246 open 0     9 2017-01-06T12:40:22Z 2020-03-29T20:37:29Z   CONTRIBUTOR      

A great beauty of numpys masked arrays is that it works with any dtype, since it does not use nan. Unfortunately, when I try to put my data into an xarray.Dataset, it converts ints to float, as shown below:

``` In [137]: x = arange(30, dtype="i1").reshape(3, 10)

In [138]: xr.Dataset({"count": (["x", "y"], ma.masked_where(x%5>3, x))}, coords={"x": range(3), "y": ...: range(10)}) Out[138]: <xarray.Dataset> Dimensions: (x: 3, y: 10) Coordinates: * y (y) int64 0 1 2 3 4 5 6 7 8 9 * x (x) int64 0 1 2 Data variables: count (x, y) float64 0.0 1.0 2.0 3.0 nan 5.0 6.0 7.0 8.0 nan 10.0 ... ```

This happens in the function _maybe_promote.

Such type “promotion” is unaffordable for me; the memory consumption of my multi-gigabyte arrays would explode by a factor 4. Secondly, many of my integer-dtype fields are bit arrays, for which floating point representation is not desirable.

It would greatly benefit xarray if it could use masking while preserving the dtype of input data.

(See also: Stackoverflow question)

{
    "url": "https://api.github.com/repos/pydata/xarray/issues/1194/reactions",
    "total_count": 2,
    "+1": 2,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
    xarray 13221727 issue
410317757 MDU6SXNzdWU0MTAzMTc3NTc= 2772 Should xarray allow assigning a masked constant? gerritholl 500246 open 0     1 2019-02-14T14:10:20Z 2019-02-15T20:24:44Z   CONTRIBUTOR      

Currently, ds['a'] = ((), ma.masked) where ds is an xarray.Dataset gives ValueError: Could not convert tuple of form (dims, data[, attrs, encoding]): ((), masked) to Variable., whereas ds['a'] = (), ma.MaskedArray(0.0, True) works (it sets the indicated value to NaN). Should assigning ma.masked be equivalent to assigning ma.MaskedArray(0.0, True), or are there good reasons for the difference in behaviour?

{
    "url": "https://api.github.com/repos/pydata/xarray/issues/2772/reactions",
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
    xarray 13221727 issue

Advanced export

JSON shape: default, array, newline-delimited, object

CSV options:

CREATE TABLE [issues] (
   [id] INTEGER PRIMARY KEY,
   [node_id] TEXT,
   [number] INTEGER,
   [title] TEXT,
   [user] INTEGER REFERENCES [users]([id]),
   [state] TEXT,
   [locked] INTEGER,
   [assignee] INTEGER REFERENCES [users]([id]),
   [milestone] INTEGER REFERENCES [milestones]([id]),
   [comments] INTEGER,
   [created_at] TEXT,
   [updated_at] TEXT,
   [closed_at] TEXT,
   [author_association] TEXT,
   [active_lock_reason] TEXT,
   [draft] INTEGER,
   [pull_request] TEXT,
   [body] TEXT,
   [reactions] TEXT,
   [performed_via_github_app] TEXT,
   [state_reason] TEXT,
   [repo] INTEGER REFERENCES [repos]([id]),
   [type] TEXT
);
CREATE INDEX [idx_issues_repo]
    ON [issues] ([repo]);
CREATE INDEX [idx_issues_milestone]
    ON [issues] ([milestone]);
CREATE INDEX [idx_issues_assignee]
    ON [issues] ([assignee]);
CREATE INDEX [idx_issues_user]
    ON [issues] ([user]);
Powered by Datasette · Queries took 24.406ms · About: xarray-datasette