id,node_id,number,title,user,state,locked,assignee,milestone,comments,created_at,updated_at,closed_at,author_association,active_lock_reason,draft,pull_request,body,reactions,performed_via_github_app,state_reason,repo,type 1812301185,I_kwDOAMm_X85sBYWB,8005,Design for IntervalIndex,2448579,open,0,,,5,2023-07-19T16:30:50Z,2023-09-09T06:30:20Z,,MEMBER,,,,"### Is your feature request related to a problem? We should add a wrapper for `pandas.IntervalIndex` this would solve a long standing problem around propagating ""bounds"" variables ([CF conventions](http://cfconventions.org/cf-conventions/cf-conventions.html#cell-boundaries), https://github.com/pydata/xarray/issues/1475) ### The CF design CF ""encoding"" for intervals is to use bounds variables. There is an attribute `""bounds""` on the dimension coordinate, that refers to a second variable (at least 2D). Example: `x` has an attribute `bounds` that refers to `x_bounds`. ```python import numpy as np left = np.arange(0.5, 3.6, 1) right = np.arange(1.5, 4.6, 1) bounds = np.stack([left, right]) ds = xr.Dataset( {""data"": (""x"", [1, 2, 3, 4])}, coords={""x"": (""x"", [1, 2, 3, 4], {""bounds"": ""x_bounds""}), ""x_bounds"": ((""bnds"", ""x""), bounds)}, ) ds ``` A fundamental problem with our current data model is that we lose `x_bounds` when we extract `ds.data` because there is a dimension `bnds` that is not shared with `ds.data`. Very important metadata is now lost! We would also like to use the ""bounds"" to enable interval based indexing. `ds.sel(x=1.1)` should give you the value from the appropriate interval. ### Pandas IntervalIndex All the indexing is easy to implement by wrapping [pandas.IntervalIndex](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.IntervalIndex.html), but there is one limitation. `pd.IntervalIndex` saves two pieces of information for each interval (left bound, right bound). CF saves three : left bound, right bound (see `x_bounds`) and a ""central"" value (see `x`). This should be OK to work around in our wrapper. ## Fundamental Question To me, a core question is whether `x_bounds` needs to be preserved *after* creating an `IntervalIndex`. 1. If so, we need a better rule around coordinate variable propagation. In this case, the IntervalIndex would be associated with `x` and `x_bounds`. So the rule could be > ""propagate all variables necessary to propagate an index associated with any of the dimensions on the extracted variable."" So when extracting `ds.data` we propagate all variables necessary to propagate indexes associated with `ds.data.dims` that is `x` which would say ""propagate `x`, `x_bounds`, and the IntervalIndex. 2. Alternatively, we could choose to drop `x_bounds` entirely. I interpret this approach as ""decoding"" the bounds variable to an interval index object. When saving to disk, we would encode the interval index in two variables. (See below) ### Describe the solution you'd like I've prototyped (2) [approach 1 in [this notebook](https://github.com/dcherian/xindexes/blob/main/interval-array.ipynb)) following @benbovy's [suggestion](https://github.com/pydata/xarray/discussions/7041#discussioncomment-4936891)
```python from xarray import Variable from xarray.indexes import PandasIndex class XarrayIntervalIndex(PandasIndex): def __init__(self, index, dim, coord_dtype): assert isinstance(index, pd.IntervalIndex) # for PandasIndex self.index = index self.dim = dim self.coord_dtype = coord_dtype @classmethod def from_variables(cls, variables, options): assert len(variables) == 1 (dim,) = tuple(variables) bounds = options[""bounds""] assert isinstance(bounds, (xr.DataArray, xr.Variable)) (axis,) = bounds.get_axis_num(set(bounds.dims) - {dim}) left, right = np.split(bounds.data, 2, axis=axis) index = pd.IntervalIndex.from_arrays(left.squeeze(), right.squeeze()) coord_dtype = bounds.dtype return cls(index, dim, coord_dtype) def create_variables(self, variables): from xarray.core.indexing import PandasIndexingAdapter newvars = {self.dim: xr.Variable(self.dim, PandasIndexingAdapter(self.index))} return newvars def __repr__(self): string = f""Xarray{self.index!r}"" return string def to_pandas_index(self): return self.index @property def mid(self): return PandasIndex(self.index.right, self.dim, self.coord_dtype) @property def left(self): return PandasIndex(self.index.right, self.dim, self.coord_dtype) @property def right(self): return PandasIndex(self.index.right, self.dim, self.coord_dtype) ```
```python ds1 = ( ds.drop_indexes(""x"") .set_xindex(""x"", XarrayIntervalIndex, bounds=ds.x_bounds) .drop_vars(""x_bounds"") ) ds1 ``` ```python ds1.sel(x=1.1) ``` ### Describe alternatives you've considered I've tried some approaches [in this notebook](https://github.com/dcherian/xindexes/blob/main/interval-array.ipynb) ","{""url"": ""https://api.github.com/repos/pydata/xarray/issues/8005/reactions"", ""total_count"": 1, ""+1"": 1, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,,13221727,issue 1812504689,I_kwDOAMm_X85sCKBx,8006,Fix documentation about datetime_unit of xarray.DataArray.differentiate,2448579,closed,0,,,0,2023-07-19T18:31:10Z,2023-09-01T09:37:15Z,2023-09-01T09:37:15Z,MEMBER,,,,"Should say that `Y` and `M` cannot be supported with `datetime64` ### Discussed in https://github.com/pydata/xarray/discussions/8000
Originally posted by **jesieleo** July 19, 2023 I have a piece of data that looks like this ``` Dimensions: (time: 612, LEV: 15, latitude: 20, longitude: 357) Coordinates: * time (time) datetime64[ns] 1960-01-15 1960-02-15 ... 2010-12-15 * LEV (LEV) float64 5.01 15.07 25.28 35.76 ... 149.0 171.4 197.8 229.5 * latitude (latitude) float64 -4.75 -4.25 -3.75 -3.25 ... 3.75 4.25 4.75 * longitude (longitude) float64 114.2 114.8 115.2 115.8 ... 291.2 291.8 292.2 Data variables: u (time, LEV, latitude, longitude) float32 ... Attributes: (12/30) cdm_data_type: Grid Conventions: COARDS, CF-1.6, ACDD-1.3 creator_email: chepurin@umd.edu creator_name: APDRC creator_type: institution creator_url: https://www.atmos.umd.edu/~ocean/ ... ... standard_name_vocabulary: CF Standard Name Table v29 summary: Simple Ocean Data Assimilation (SODA) soda po... time_coverage_end: 2010-12-15T00:00:00Z time_coverage_start: 1983-01-15T00:00:00Z title: SODA soda pop2.2.4 [TIME][LEV][LAT][LON] Westernmost_Easting: 118.25 ``` when i try to use xarray.DataArray.differentiate `data.u.differentiate('time',datetime_unit='M')` will appear ``` Traceback (most recent call last): File """", line 1, in File ""D:\Anaconda3\lib\site-packages\xarray\core\dataarray.py"", line 3609, in differentiate ds = self._to_temp_dataset().differentiate(coord, edge_order, datetime_unit) File ""D:\Anaconda3\lib\site-packages\xarray\core\dataset.py"", line 6372, in differentiate coord_var = coord_var._to_numeric(datetime_unit=datetime_unit) File ""D:\Anaconda3\lib\site-packages\xarray\core\variable.py"", line 2428, in _to_numeric numeric_array = duck_array_ops.datetime_to_numeric( File ""D:\Anaconda3\lib\site-packages\xarray\core\duck_array_ops.py"", line 466, in datetime_to_numeric array = array / np.timedelta64(1, datetime_unit) TypeError: Cannot get a common metadata divisor for Numpy datatime metadata [ns] and [M] because they have incompatible nonlinear base time units. ``` Would you please told me is this a BUG?
","{""url"": ""https://api.github.com/repos/pydata/xarray/issues/8006/reactions"", ""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,completed,13221727,issue 1812646094,PR_kwDOAMm_X85V7g7q,8007,Update copyright year in README,2448579,closed,0,,,0,2023-07-19T20:00:50Z,2023-07-20T21:13:27Z,2023-07-20T21:13:26Z,MEMBER,,0,pydata/xarray/pulls/8007,,"{""url"": ""https://api.github.com/repos/pydata/xarray/issues/8007/reactions"", ""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,,13221727,pull