id,node_id,number,title,user,state,locked,assignee,milestone,comments,created_at,updated_at,closed_at,author_association,active_lock_reason,draft,pull_request,body,reactions,performed_via_github_app,state_reason,repo,type
1954445639,I_kwDOAMm_X850fnlH,8350,optimize align for scalars at least,2448579,open,0,,,5,2023-10-20T14:48:25Z,2023-10-20T19:17:39Z,,MEMBER,,,,"### What happened?
Here's a simple rescaling calculation:
```python
import numpy as np
import xarray as xr
ds = xr.Dataset(
{""a"": ((""x"", ""y""), np.ones((300, 400))), ""b"": ((""x"", ""y""), np.ones((300, 400)))}
)
mean = ds.mean() # scalar
std = ds.std() # scalar
rescaled = (ds - mean) / std
```
The profile for the last line shows 30% (!!!) time spent in `align` (really `reindex_like`) except there's nothing to reindex when only scalars are involved!
This is a small example inspired by a ML pipeline where this normalization is happening very many times in a tight loop.
cc @benbovy
### What did you expect to happen?
A fast path for when no reindexing needs to happen.
","{""url"": ""https://api.github.com/repos/pydata/xarray/issues/8350/reactions"", ""total_count"": 1, ""+1"": 1, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,,13221727,issue
1217566173,I_kwDOAMm_X85IkpXd,6528,cumsum drops index coordinates,2448579,open,0,,,5,2022-04-27T16:04:08Z,2023-09-22T07:55:56Z,,MEMBER,,,,"### What happened?
cumsum drops index coordinates. Seen in #6525, #3417
### What did you expect to happen?
Preserve index coordinates
### Minimal Complete Verifiable Example
```Python
import xarray as xr
ds = xr.Dataset(
{""foo"": ((""x"",), [7, 3, 1, 1, 1, 1, 1])},
coords={""x"": [0, 1, 2, 3, 4, 5, 6]},
)
ds.cumsum(""x"")
```
```
Dimensions: (x: 7)
Dimensions without coordinates: x
Data variables:
foo (x) int64 7 10 11 12 13 14 15
```
### Relevant log output
_No response_
### Anything else we need to know?
_No response_
### Environment
xarray main
","{""url"": ""https://api.github.com/repos/pydata/xarray/issues/6528/reactions"", ""total_count"": 3, ""+1"": 3, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,,13221727,issue
1812301185,I_kwDOAMm_X85sBYWB,8005,Design for IntervalIndex,2448579,open,0,,,5,2023-07-19T16:30:50Z,2023-09-09T06:30:20Z,,MEMBER,,,,"### Is your feature request related to a problem?
We should add a wrapper for `pandas.IntervalIndex` this would solve a long standing problem around propagating ""bounds"" variables ([CF conventions](http://cfconventions.org/cf-conventions/cf-conventions.html#cell-boundaries), https://github.com/pydata/xarray/issues/1475)
### The CF design
CF ""encoding"" for intervals is to use bounds variables. There is an attribute `""bounds""` on the dimension coordinate, that refers to a second variable (at least 2D). Example: `x` has an attribute `bounds` that refers to `x_bounds`.
```python
import numpy as np
left = np.arange(0.5, 3.6, 1)
right = np.arange(1.5, 4.6, 1)
bounds = np.stack([left, right])
ds = xr.Dataset(
{""data"": (""x"", [1, 2, 3, 4])},
coords={""x"": (""x"", [1, 2, 3, 4], {""bounds"": ""x_bounds""}), ""x_bounds"": ((""bnds"", ""x""), bounds)},
)
ds
```
A fundamental problem with our current data model is that we lose `x_bounds` when we extract `ds.data` because there is a dimension `bnds` that is not shared with `ds.data`. Very important metadata is now lost!
We would also like to use the ""bounds"" to enable interval based indexing. `ds.sel(x=1.1)` should give you the value from the appropriate interval.
### Pandas IntervalIndex
All the indexing is easy to implement by wrapping [pandas.IntervalIndex](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.IntervalIndex.html), but there is one limitation. `pd.IntervalIndex` saves two pieces of information for each interval (left bound, right bound). CF saves three : left bound, right bound (see `x_bounds`) and a ""central"" value (see `x`). This should be OK to work around in our wrapper.
## Fundamental Question
To me, a core question is whether `x_bounds` needs to be preserved *after* creating an `IntervalIndex`.
1. If so, we need a better rule around coordinate variable propagation. In this case, the IntervalIndex would be associated with `x` and `x_bounds`. So the rule could be
> ""propagate all variables necessary to propagate an index associated with any of the dimensions on the extracted variable.""
So when extracting `ds.data` we propagate all variables necessary to propagate indexes associated with `ds.data.dims` that is `x` which would say ""propagate `x`, `x_bounds`, and the IntervalIndex.
2. Alternatively, we could choose to drop `x_bounds` entirely. I interpret this approach as ""decoding"" the bounds variable to an interval index object. When saving to disk, we would encode the interval index in two variables. (See below)
### Describe the solution you'd like
I've prototyped (2) [approach 1 in [this notebook](https://github.com/dcherian/xindexes/blob/main/interval-array.ipynb)) following @benbovy's [suggestion](https://github.com/pydata/xarray/discussions/7041#discussioncomment-4936891)
```python
from xarray import Variable
from xarray.indexes import PandasIndex
class XarrayIntervalIndex(PandasIndex):
def __init__(self, index, dim, coord_dtype):
assert isinstance(index, pd.IntervalIndex)
# for PandasIndex
self.index = index
self.dim = dim
self.coord_dtype = coord_dtype
@classmethod
def from_variables(cls, variables, options):
assert len(variables) == 1
(dim,) = tuple(variables)
bounds = options[""bounds""]
assert isinstance(bounds, (xr.DataArray, xr.Variable))
(axis,) = bounds.get_axis_num(set(bounds.dims) - {dim})
left, right = np.split(bounds.data, 2, axis=axis)
index = pd.IntervalIndex.from_arrays(left.squeeze(), right.squeeze())
coord_dtype = bounds.dtype
return cls(index, dim, coord_dtype)
def create_variables(self, variables):
from xarray.core.indexing import PandasIndexingAdapter
newvars = {self.dim: xr.Variable(self.dim, PandasIndexingAdapter(self.index))}
return newvars
def __repr__(self):
string = f""Xarray{self.index!r}""
return string
def to_pandas_index(self):
return self.index
@property
def mid(self):
return PandasIndex(self.index.right, self.dim, self.coord_dtype)
@property
def left(self):
return PandasIndex(self.index.right, self.dim, self.coord_dtype)
@property
def right(self):
return PandasIndex(self.index.right, self.dim, self.coord_dtype)
```
```python
ds1 = (
ds.drop_indexes(""x"")
.set_xindex(""x"", XarrayIntervalIndex, bounds=ds.x_bounds)
.drop_vars(""x_bounds"")
)
ds1
```
```python
ds1.sel(x=1.1)
```
### Describe alternatives you've considered
I've tried some approaches [in this notebook](https://github.com/dcherian/xindexes/blob/main/interval-array.ipynb)
","{""url"": ""https://api.github.com/repos/pydata/xarray/issues/8005/reactions"", ""total_count"": 1, ""+1"": 1, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,,13221727,issue