id,node_id,number,title,user,state,locked,assignee,milestone,comments,created_at,updated_at,closed_at,author_association,active_lock_reason,draft,pull_request,body,reactions,performed_via_github_app,state_reason,repo,type
1812301185,I_kwDOAMm_X85sBYWB,8005,Design for IntervalIndex,2448579,open,0,,,5,2023-07-19T16:30:50Z,2023-09-09T06:30:20Z,,MEMBER,,,,"### Is your feature request related to a problem?
We should add a wrapper for `pandas.IntervalIndex` this would solve a long standing problem around propagating ""bounds"" variables ([CF conventions](http://cfconventions.org/cf-conventions/cf-conventions.html#cell-boundaries), https://github.com/pydata/xarray/issues/1475)
### The CF design
CF ""encoding"" for intervals is to use bounds variables. There is an attribute `""bounds""` on the dimension coordinate, that refers to a second variable (at least 2D). Example: `x` has an attribute `bounds` that refers to `x_bounds`.
```python
import numpy as np
left = np.arange(0.5, 3.6, 1)
right = np.arange(1.5, 4.6, 1)
bounds = np.stack([left, right])
ds = xr.Dataset(
{""data"": (""x"", [1, 2, 3, 4])},
coords={""x"": (""x"", [1, 2, 3, 4], {""bounds"": ""x_bounds""}), ""x_bounds"": ((""bnds"", ""x""), bounds)},
)
ds
```
A fundamental problem with our current data model is that we lose `x_bounds` when we extract `ds.data` because there is a dimension `bnds` that is not shared with `ds.data`. Very important metadata is now lost!
We would also like to use the ""bounds"" to enable interval based indexing. `ds.sel(x=1.1)` should give you the value from the appropriate interval.
### Pandas IntervalIndex
All the indexing is easy to implement by wrapping [pandas.IntervalIndex](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.IntervalIndex.html), but there is one limitation. `pd.IntervalIndex` saves two pieces of information for each interval (left bound, right bound). CF saves three : left bound, right bound (see `x_bounds`) and a ""central"" value (see `x`). This should be OK to work around in our wrapper.
## Fundamental Question
To me, a core question is whether `x_bounds` needs to be preserved *after* creating an `IntervalIndex`.
1. If so, we need a better rule around coordinate variable propagation. In this case, the IntervalIndex would be associated with `x` and `x_bounds`. So the rule could be
> ""propagate all variables necessary to propagate an index associated with any of the dimensions on the extracted variable.""
So when extracting `ds.data` we propagate all variables necessary to propagate indexes associated with `ds.data.dims` that is `x` which would say ""propagate `x`, `x_bounds`, and the IntervalIndex.
2. Alternatively, we could choose to drop `x_bounds` entirely. I interpret this approach as ""decoding"" the bounds variable to an interval index object. When saving to disk, we would encode the interval index in two variables. (See below)
### Describe the solution you'd like
I've prototyped (2) [approach 1 in [this notebook](https://github.com/dcherian/xindexes/blob/main/interval-array.ipynb)) following @benbovy's [suggestion](https://github.com/pydata/xarray/discussions/7041#discussioncomment-4936891)
```python
ds1.sel(x=1.1)
```
### Describe alternatives you've considered
I've tried some approaches [in this notebook](https://github.com/dcherian/xindexes/blob/main/interval-array.ipynb)
","{""url"": ""https://api.github.com/repos/pydata/xarray/issues/8005/reactions"", ""total_count"": 1, ""+1"": 1, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,,13221727,issue
1812504689,I_kwDOAMm_X85sCKBx,8006,Fix documentation about datetime_unit of xarray.DataArray.differentiate,2448579,closed,0,,,0,2023-07-19T18:31:10Z,2023-09-01T09:37:15Z,2023-09-01T09:37:15Z,MEMBER,,,,"Should say that `Y` and `M` cannot be supported with `datetime64`
### Discussed in https://github.com/pydata/xarray/discussions/8000