id,node_id,number,title,user,state,locked,assignee,milestone,comments,created_at,updated_at,closed_at,author_association,active_lock_reason,draft,pull_request,body,reactions,performed_via_github_app,state_reason,repo,type 203999231,MDU6SXNzdWUyMDM5OTkyMzE=,1238,`set_index` converts string-dtype to object-dtype,500246,open,0,,,10,2017-01-30T12:37:05Z,2023-03-13T14:09:21Z,,CONTRIBUTOR,,,,"'Dataset.set_index' apparently changes a ` Dimensions: (a: 5, b: 5) Coordinates: * b (b) int64 0 2 4 6 8 c (a) Dimensions: (a: 5, b: 5) Coordinates: * b (b) int64 0 2 4 6 8 * a (a) object 'A' 'B' 'C' 'D' 'E' Data variables: x (a, b) int64 100 101 102 103 104 105 106 107 108 109 110 111 ... y (b) int64 -100 -99 -98 -97 -96 ```","{""url"": ""https://api.github.com/repos/pydata/xarray/issues/1238/reactions"", ""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,,13221727,issue 741806260,MDU6SXNzdWU3NDE4MDYyNjA=,4579,Invisible differences between arrays using IntervalIndex,500246,open,0,,,2,2020-11-12T17:54:55Z,2022-10-03T15:09:25Z,,CONTRIBUTOR,,,," **What happened**: I have two `DataArray`s that each have a coordinate constructed with `pandas.interval_range`. In one case I pass the `interval_range` directly, in the other case I call `.to_numpy()` first. The two `DataArray`s look identical but aren't. This can lead to hard-to-find bugs, because behaviour is not identical: the former supports indexing whereas the latter doesn't. **What you expected to happen**: I expect two arrays that appear identical to behave identically. If they don't behave identically then there should be some way to tell the difference (apart from `equals`, which tells me they are different but not how). **Minimal Complete Verifiable Example**: ```python import xarray import pandas da1 = xarray.DataArray([0, 1, 2], dims=(""x"",), coords={""x"": pandas.interval_range(0, 2, 3)}) da2 = xarray.DataArray([0, 1, 2], dims=(""x"",), coords={""x"": pandas.interval_range(0, 2, 3).to_numpy()}) print(repr(da1) == repr(da2)) print(repr(da1.x) == repr(da2.x)) print(da1.x.dtype == da2.x.dtype) # identical? No: print(da1.equals(da2)) print(da1.x.equals(da2.x)) # in particular: da1.sel(x=1) # works da2.sel(x=1) # fails ``` Results in: ``` True True True False False Traceback (most recent call last): File ""/data/gholl/miniconda3/envs/py38/lib/python3.8/site-packages/pandas/core/indexes/base.py"", line 2895, in get_loc return self._engine.get_loc(casted_key) File ""pandas/_libs/index.pyx"", line 70, in pandas._libs.index.IndexEngine.get_loc File ""pandas/_libs/index.pyx"", line 101, in pandas._libs.index.IndexEngine.get_loc File ""pandas/_libs/hashtable_class_helper.pxi"", line 1675, in pandas._libs.hashtable.PyObjectHashTable.get_item File ""pandas/_libs/hashtable_class_helper.pxi"", line 1683, in pandas._libs.hashtable.PyObjectHashTable.get_item KeyError: 1 The above exception was the direct cause of the following exception: Traceback (most recent call last): File ""mwe105.py"", line 19, in da2.sel(x=1) # fails File ""/data/gholl/miniconda3/envs/py38/lib/python3.8/site-packages/xarray/core/dataarray.py"", line 1143, in sel ds = self._to_temp_dataset().sel( File ""/data/gholl/miniconda3/envs/py38/lib/python3.8/site-packages/xarray/core/dataset.py"", line 2105, in sel pos_indexers, new_indexes = remap_label_indexers( File ""/data/gholl/miniconda3/envs/py38/lib/python3.8/site-packages/xarray/core/coordinates.py"", line 397, in remap_label_indexers pos_indexers, new_indexes = indexing.remap_label_indexers( File ""/data/gholl/miniconda3/envs/py38/lib/python3.8/site-packages/xarray/core/indexing.py"", line 275, in remap_label_indexers idxr, new_idx = convert_label_indexer(index, label, dim, method, tolerance) File ""/data/gholl/miniconda3/envs/py38/lib/python3.8/site-packages/xarray/core/indexing.py"", line 196, in convert_label_indexer indexer = index.get_loc(label_value, method=method, tolerance=tolerance) File ""/data/gholl/miniconda3/envs/py38/lib/python3.8/site-packages/pandas/core/indexes/base.py"", line 2897, in get_loc raise KeyError(key) from err KeyError: 1 ``` **Additional context** I suppose this happens because under the hood xarray does something clever to support pandas-style indexing even though the coordinate variable appears like a numpy array with an object dtype, and that this cleverness is lost if the object is already converted to a numpy array. But there is, as far as I can see, no way to tell the difference once the objects have been created. **Environment**:
Output of xr.show_versions() INSTALLED VERSIONS ------------------ commit: None python: 3.8.6 | packaged by conda-forge | (default, Oct 7 2020, 19:08:05) [GCC 7.5.0] python-bits: 64 OS: Linux OS-release: 4.12.14-lp150.12.82-default machine: x86_64 processor: x86_64 byteorder: little LC_ALL: None LANG: en_GB.UTF-8 LOCALE: en_GB.UTF-8 libhdf5: 1.10.6 libnetcdf: 4.7.4 xarray: 0.16.1 pandas: 1.1.4 numpy: 1.19.4 scipy: 1.5.3 netCDF4: 1.5.4 pydap: None h5netcdf: 0.8.1 h5py: 3.1.0 Nio: None zarr: 2.5.0 cftime: 1.2.1 nc_time_axis: None PseudoNetCDF: None rasterio: 1.1.7 cfgrib: None iris: None bottleneck: None dask: 2.30.0 distributed: 2.30.1 matplotlib: 3.3.2 cartopy: 0.18.0 seaborn: None numbagg: None pint: None setuptools: 49.6.0.post20201009 pip: 20.2.4 conda: installed pytest: 6.1.2 IPython: 7.19.0 sphinx: 3.3.0
","{""url"": ""https://api.github.com/repos/pydata/xarray/issues/4579/reactions"", ""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,,13221727,issue 232623945,MDU6SXNzdWUyMzI2MjM5NDU=,1435,xarray.plot.imshow with datetime coordinates results in blank plot,500246,open,0,,,6,2017-05-31T16:31:30Z,2022-05-03T01:56:37Z,,CONTRIBUTOR,,,,"``` In [72]: da = xarray.DataArray(arange(5*6).reshape(5,6), dims=(""A"", ""B""), coords={""A"": arange(5), ""B"": pd.date_range(""2000-01-01"", periods=6)}) In [73]: da.plot.imshow() Out[73]: ``` The resulting plot has the correct axes and colorbar, but the contents of the plot itself are blank. Upon moving the cursor over the plot, there is an exception in `Tkinter`: ``` Exception in Tkinter callback Traceback (most recent call last): File ""/home/users/gholl/lib/python3.5/tkinter/__init__.py"", line 1549, in __call__ return self.func(*args) File ""/dev/shm/gerrit/venv/stable-3.5/lib/python3.5/site-packages/matplotlib/backends/backend_tkagg.py"", line 387, in motion_notify_event FigureCanvasBase.motion_notify_event(self, x, y, guiEvent=event) File ""/dev/shm/gerrit/venv/stable-3.5/lib/python3.5/site-packages/matplotlib/backend_bases.py"", line 1966, in motion_notify_event self.callbacks.process(s, event) File ""/dev/shm/gerrit/venv/stable-3.5/lib/python3.5/site-packages/matplotlib/cbook.py"", line 554, in process proxy(*args, **kwargs) File ""/dev/shm/gerrit/venv/stable-3.5/lib/python3.5/site-packages/matplotlib/cbook.py"", line 416, in __call__ return mtd(*args, **kwargs) File ""/dev/shm/gerrit/venv/stable-3.5/lib/python3.5/site-packages/matplotlib/backend_bases.py"", line 2857, in mouse_move artists = [a for a in event.inaxes.mouseover_set File ""/dev/shm/gerrit/venv/stable-3.5/lib/python3.5/site-packages/matplotlib/backend_bases.py"", line 2858, in if a.contains(event) and a.get_visible()] File ""/dev/shm/gerrit/venv/stable-3.5/lib/python3.5/site-packages/matplotlib/image.py"", line 567, in contains inside = ((x >= xmin) and (x <= xmax) and TypeError: invalid type promotion ```","{""url"": ""https://api.github.com/repos/pydata/xarray/issues/1435/reactions"", ""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,,13221727,issue 686461572,MDU6SXNzdWU2ODY0NjE1NzI=,4378,Plotting when Interval coordinate is timedelta-based,500246,open,0,,,2,2020-08-26T16:36:27Z,2022-04-18T21:55:15Z,,CONTRIBUTOR,,,," **Is your feature request related to a problem? Please describe.** The xarray plotting interface supports coordinates containing `pandas.Interval` iff those intervals contain numbers. It fails when those intervals contain `pandas.Timedelta`: ```python import numpy as np import pandas as pd import xarray as xr da = xr.DataArray( np.arange(10), dims=(""x"",), coords={""x"": [pd.Interval(i, i+1) for i in range(10)]}) da.plot() # works da = xr.DataArray( np.arange(10), dims=(""x"",), coords={""x"": [pd.Interval( d-pd.Timestamp(""2000-01-01""), d-pd.Timestamp(""2000-01-01"")+pd.Timedelta(""1H"")) for d in pd.date_range(""2000-01-01"", ""2000-01-02"", 10)]}) da.plot() # fails ``` The latter fails with: ``` Traceback (most recent call last): File ""mwe82.py"", line 18, in da.plot() # fails File ""/data/gholl/miniconda3/envs/py38/lib/python3.8/site-packages/xarray/plot/plot.py"", line 446, in __call__ return plot(self._da, **kwargs) File ""/data/gholl/miniconda3/envs/py38/lib/python3.8/site-packages/xarray/plot/plot.py"", line 200, in plot return plotfunc(darray, **kwargs) File ""/data/gholl/miniconda3/envs/py38/lib/python3.8/site-packages/xarray/plot/plot.py"", line 302, in line _ensure_plottable(xplt_val, yplt_val) File ""/data/gholl/miniconda3/envs/py38/lib/python3.8/site-packages/xarray/plot/utils.py"", line 551, in _ensure_plottable raise TypeError( TypeError: Plotting requires coordinates to be numeric or dates of type np.datetime64, datetime.datetime, cftime.datetime or pd.Interval. ``` This error message is somewhat confusing, because the coordinates _are_ ""dates of type (...) pd.Interval"", but perhaps a timedelta is not considered a date. **Describe the solution you'd like** I would like that I can use the xarray plotting interface for any pandas.Interval coordinate, including `pandas.Timestamp` and `pandas.Timedelta`. **Describe alternatives you've considered** I'll ""manually"" calculate the midpoints and use those as a timedelta coordinate instead. **Additional context** It seems that regular timedeltas aren't really supported either, although they don't cause an error message, they rather produce [incorrect results](https://stackoverflow.com/q/50717534). There's probably a related issue somewhere, but I can't find it now.","{""url"": ""https://api.github.com/repos/pydata/xarray/issues/4378/reactions"", ""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,,13221727,issue 203630267,MDU6SXNzdWUyMDM2MzAyNjc=,1234,`where` grows new dimensions for unrelated variables,500246,open,0,,,5,2017-01-27T13:02:34Z,2022-04-18T16:04:16Z,,CONTRIBUTOR,,,,"In the example below, the dimensionality for data variable `y` grows from `(b)` to `(b, a)` after calling the dataset `where` method. This behaviour does not appear to be documented. Is it a bug? ``` In [46]: ds = xarray.Dataset({""x"": ((""a"", ""b""), arange(25).reshape(5,5)+100), ""y"": (""b"", arange(5)-100)}, {""a"": arange(5), ""b"": arange(5)*2, ""c"": ((""a"",), list(""ABCDE""))}) In [47]: print(ds) Dimensions: (a: 5, b: 5) Coordinates: * b (b) int64 0 2 4 6 8 c (a) ='A') & (ds.c<='C')) Out[69]: Dimensions: (a: 5, b: 5) Coordinates: * b (b) int64 0 2 4 6 8 c (a) array([9223372036854775808, 1], dtype=uint64) Dimensions without coordinates: dim_0 ``` **Anything else we need to know?**: In numpy the equivalent code raises `ValueError` This is related but different from #2945. In #2945, xarray behaves the same as numpy. In #4612, xarray behaves differently from numpy. **Environment**:
Output of xr.show_versions() INSTALLED VERSIONS ------------------ commit: None python: 3.8.6 | packaged by conda-forge | (default, Oct 7 2020, 19:08:05) [GCC 7.5.0] python-bits: 64 OS: Linux OS-release: 4.12.14-lp150.12.82-default machine: x86_64 processor: x86_64 byteorder: little LC_ALL: None LANG: en_GB.UTF-8 LOCALE: en_GB.UTF-8 libhdf5: 1.10.6 libnetcdf: 4.7.4 xarray: 0.16.1 pandas: 1.1.4 numpy: 1.19.4 scipy: 1.5.3 netCDF4: 1.5.4 pydap: None h5netcdf: 0.8.1 h5py: 3.1.0 Nio: None zarr: 2.5.0 cftime: 1.2.1 nc_time_axis: None PseudoNetCDF: None rasterio: 1.1.7 cfgrib: None iris: None bottleneck: None dask: 2.30.0 distributed: 2.30.1 matplotlib: 3.3.2 cartopy: 0.18.0 seaborn: None numbagg: None pint: None setuptools: 49.6.0.post20201009 pip: 20.2.4 conda: installed pytest: 6.1.2 IPython: 7.19.0 sphinx: 3.3.0
","{""url"": ""https://api.github.com/repos/pydata/xarray/issues/4612/reactions"", ""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,,13221727,issue 283345586,MDU6SXNzdWUyODMzNDU1ODY=,1792,Comparison with masked array yields object-array with nans for masked values,500246,open,0,,,3,2017-12-19T19:37:13Z,2020-10-11T13:34:25Z,,CONTRIBUTOR,,,,"#### Code Sample, a copy-pastable example if possible ``` $ cat mwe.py #!/usr/bin/env python3.6 import xarray import numpy da = xarray.DataArray(numpy.arange(5)) ma = numpy.ma.masked_array(numpy.arange(5), [True, False, False, False, True]) print(da>ma) $ ./mwe.py array([nan, False, False, False, nan], dtype=object) Dimensions without coordinates: dim_0 ``` #### Problem description A comparison between a `DataArray` and a `masked_array` results in an array with dtype `object` instead of an array with dtype `bool`. This is problematic, because code should be able to assume that `x > y` returns something with a `bool` dtype. #### Expected Output I would expect the masked array to be dropped (which it is) and an array to be returned equivalent to the comparison `da>ma.data` ``` array([False, False, False, False, False], dtype=bool) Dimensions without coordinates: dim_0 ``` #### Output of ``xr.show_versions()``
INSTALLED VERSIONS ------------------ commit: None python: 3.6.1.final.0 python-bits: 64 OS: Linux OS-release: 2.6.32-696.6.3.el6.x86_64 machine: x86_64 processor: x86_64 byteorder: little LC_ALL: None LANG: en_GB.UTF-8 LOCALE: en_GB.UTF-8 xarray: 0.10.0+dev12.gf882a58 pandas: 0.21.0 numpy: 1.13.3 scipy: 1.0.0 netCDF4: 1.3.1 h5netcdf: None Nio: None bottleneck: 1.2.1 cyordereddict: None dask: 0.16.0 matplotlib: 2.1.0 cartopy: None seaborn: 0.8.1 setuptools: 38.2.4 pip: 9.0.1 conda: 4.3.16 pytest: 3.1.2 IPython: 6.1.0 sphinx: 1.6.2 None
","{""url"": ""https://api.github.com/repos/pydata/xarray/issues/1792/reactions"", ""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,,13221727,issue 199188476,MDU6SXNzdWUxOTkxODg0NzY=,1194,Use masked arrays while preserving int,500246,open,0,,,9,2017-01-06T12:40:22Z,2020-03-29T20:37:29Z,,CONTRIBUTOR,,,,"A great beauty of numpys masked arrays is that it works with any dtype, since it does not use `nan`. Unfortunately, when I try to put my data into an `xarray.Dataset`, it converts ints to float, as shown below: ``` In [137]: x = arange(30, dtype=""i1"").reshape(3, 10) In [138]: xr.Dataset({""count"": ([""x"", ""y""], ma.masked_where(x%5>3, x))}, coords={""x"": range(3), ""y"": ...: range(10)}) Out[138]: Dimensions: (x: 3, y: 10) Coordinates: * y (y) int64 0 1 2 3 4 5 6 7 8 9 * x (x) int64 0 1 2 Data variables: count (x, y) float64 0.0 1.0 2.0 3.0 nan 5.0 6.0 7.0 8.0 nan 10.0 ... ``` This happens in the function [`_maybe_promote`](https://github.com/pydata/xarray/blob/master/xarray/core/common.py#L693). Such type “promotion” is unaffordable for me; the memory consumption of my multi-gigabyte arrays would explode by a factor 4. Secondly, many of my integer-dtype fields are bit arrays, for which floating point representation is not desirable. It would greatly benefit `xarray` if it could use masking while preserving the dtype of input data. (See also: [Stackoverflow question](http://stackoverflow.com/q/41505699/974555))","{""url"": ""https://api.github.com/repos/pydata/xarray/issues/1194/reactions"", ""total_count"": 2, ""+1"": 2, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,,13221727,issue 410317757,MDU6SXNzdWU0MTAzMTc3NTc=,2772,Should xarray allow assigning a masked constant?,500246,open,0,,,1,2019-02-14T14:10:20Z,2019-02-15T20:24:44Z,,CONTRIBUTOR,,,,"Currently, `ds['a'] = ((), ma.masked)` where `ds` is an `xarray.Dataset` gives `ValueError: Could not convert tuple of form (dims, data[, attrs, encoding]): ((), masked) to Variable.`, whereas `ds['a'] = (), ma.MaskedArray(0.0, True)` works (it sets the indicated value to NaN). Should assigning `ma.masked` be equivalent to assigning `ma.MaskedArray(0.0, True)`, or are there good reasons for the difference in behaviour?","{""url"": ""https://api.github.com/repos/pydata/xarray/issues/2772/reactions"", ""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,,13221727,issue