html_url,issue_url,id,node_id,user,created_at,updated_at,author_association,body,reactions,performed_via_github_app,issue
https://github.com/pydata/xarray/issues/7031#issuecomment-1249218631,https://api.github.com/repos/pydata/xarray/issues/7031,1249218631,IC_kwDOAMm_X85KdZBH,4160723,2022-09-16T10:50:10Z,2022-09-16T10:50:10Z,MEMBER,"> In general, it seems like most (nearly all?) 1D indexing use cases can be handled by encapsulating a PandasIndex (see also https://github.com/dcherian/crsindex). So perhaps we should just recommend that and add a lot more comments to PandasIndex to make it easier to build on.
I've created a `MultiPandasIndex` helper class for that purpose: [notebook](https://notebooksharing.space/view/3d599addf8bd6b06a6acc241453da95e28c61dea4281ecd194fbe8464c9b296f#displayOptions=).
I've extracted the boilerplate from @dcherian's `CRSIndex` and I've implemented the remaining Index API. It raised a couple of issues, notably for `Index.isel` which should probably return a `dict[Hashable, Index]` instead of `Index | None` (the latter is not flexible enough, i.e., when the dimensions of the meta-index are partially dropped in the selection).
> The other thing I will think about is whether anything special needs to happen for 2D+ periodicity. I suspect that for integers you could just use independent 1D indexes along each dim but for slicing across the ""dateline"" it might get messy in higher dimensions...
Yeah I guess it will work well with independent `PeriodicBoundaryIndex` instances (possibly grouped in a `MultiPandasIndex`) for gridded data.
For multi-dimension coordinates with periodic boundaries this would probably be best handled by more specific indexes, e.g., [xoak's s2point index](https://xoak.readthedocs.io/en/latest/_api_generated/xoak.index.s2_adapters.S2PointIndexAdapter.html) that supports periodicity for lat/lon data (I think).
","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,1372035441
https://github.com/pydata/xarray/issues/7031#issuecomment-1247731386,https://api.github.com/repos/pydata/xarray/issues/7031,1247731386,IC_kwDOAMm_X85KXt66,4160723,2022-09-15T08:03:50Z,2022-09-15T08:03:50Z,MEMBER,"Great @TomNicholas!
To avoid copying the body of `PandasIndex.sel`, couldn't you ""just"" do something like this?
```python
class PeriodicBoundaryIndex(PandasIndex):
""""""
An index representing any 1D periodic numberline.
Implementation subclasses a normal xarray PandasIndex object but intercepts indexer queries.
""""""
period: float
def __init__(self, *args, period=360, **kwargs):
super().__init__(*args, **kwargs)
self.period = period
@classmethod
def from_variables(self, variables, options):
obj = super().from_variables(variables, options={})
obj.period = options.get(""period"", obj.period)
return obj
def _wrap_periodically(self, label_value):
return self.index.min() + (label_value - self.index.max()) % self.period
def sel(
self, labels: dict[Any, Any], method=None, tolerance=None
) -> IndexSelResult:
""""""Remaps labels outside of the indexes' range back to integer indices inside the range.""""""
assert len(labels) == 1
coord_name, label = next(iter(labels.items()))
if isinstance(label, slice):
wrapped_label = slice(
self._wrap_periodically(label.start),
self._wrap_periodically(label.stop),
)
else:
wrapped_label = self._wrap_periodically(label)
return super().sel({coord_name: wrapped_label})
```
Note: I also added `period` as an option, which is supported in #6971 but not yet well documented. Another way to pass options is via coordinate attributes, like in this [FunctionalIndex example](https://notebooksharing.space/view/00e4f9cf885fd4624de2eb3a26b779765fb3fbd57e7cf75acd176752064fa613#displayOptions=).
It should work in most cases I think:
```python
lon_coord = xr.DataArray(data=np.linspace(-180, 180, 19), dims=""lon"")
da = xr.DataArray(data=np.random.randn(19), dims=""lon"", coords={""lon"": lon_coord})
# note the period set here
world = da.drop_indexes(""lon"").set_xindex(""lon"", index_cls=PeriodicBoundaryIndex, period=360)
```
```python
world.sel(lon=200, method=""nearest"")
#
# array(-0.86583185)
# Coordinates:
# lon float64 -160.0
world.sel(lon=[200, 200], method=""nearest"")
#
# array([-0.86583185, -0.86583185])
# Coordinates:
# * lon (lon) float64 -160.0 -160.0
world.sel(lon=slice(180, 200), method=""nearest"")
#
# array([-1.59829997, -0.86583185])
# Coordinates:
# * lon (lon) float64 -180.0 -160.0
```
There's likely more things to do for slices as you point out. I don't think either that it's possible to pass two slices to `isel`. Not sure how this could be handled, but probably the easiest is to raise for cases like `world.sel(lon=slice(170, 190))`.
If we really need more flexibility in `sel` without copying the whole body of `PandasIndex.sel`, we could indeed refactor `PandasIndex` to allow more customization in subclasses. We must be careful, though, as it may be harder to make changes without possibly breaking 3rd-party stuff.
Or like you suggest we could define some `_pre_process` / `_post_process` hooks. It's not obvious where to call those hooks, though. Before or after converting from/to Variable or DataArray? Before or after checking for slices? array or scalar? The ideal place may change from one index to another.","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,1372035441
https://github.com/pydata/xarray/issues/7031#issuecomment-1246944137,https://api.github.com/repos/pydata/xarray/issues/7031,1246944137,IC_kwDOAMm_X85KUtuJ,4160723,2022-09-14T15:30:59Z,2022-09-14T16:31:29Z,MEMBER,"> My understanding from reading the docs was that every `Dataset.meth` calls the corresponding `Index.meth`.
Yes that's indeed what I've written in #6975 and I realize now that this is confusing, especially for `isel`.
> So `Dataset.sel` calls `Index.sel`, but can also sometimes call `Dataset.isel`. But `Dataset.isel` does not call `Index.isel`, nor `Index.sel`.
So we can describe the implementation of `Dataset.sel()` as a two-step procedure:
1. remap the input dictionary `{coord_name: label_values}` to a dictionary `{dimension_name: int_positions}`.
- This is done via dispatching the input dictionary and calling `Index.sel()` for each of the relevant indexes found in `Dataset.xindexes`, and then merging all the returned results into a single output dictionary.
2. pass the the dictionary `{dimension_name: int_positions}` to `Dataset.isel()`.
- `Dataset.isel()` will dispatch this input dictionary and call `Variable.isel()` for each variable in `Dataset.variables` and `Index.isel()` for each unique index in `Dataset.xindexes`.
This omits a few implementation details (special cases for multi-index), but that's basically how it works.
I think it would help if such ""how label-based selection works in Xarray"" high-level description was added somewhere in the ""Xarray internals"" documentation, along with other ""how it works"" sections for, e.g., alignment.
","{""total_count"": 1, ""+1"": 1, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,1372035441
https://github.com/pydata/xarray/issues/7031#issuecomment-1246398771,https://api.github.com/repos/pydata/xarray/issues/7031,1246398771,IC_kwDOAMm_X85KSokz,4160723,2022-09-14T08:08:26Z,2022-09-14T08:08:26Z,MEMBER,"tl;dr: Xarray `Index` currently supports implementing periodic indexing for label-based indexing but not for location-based (integer) indexing.
There's a big difference now between `isel` and `sel`:
- `Dataset.isel()` accepts dimension names only
- `Dataset.sel()` accepts coordinate names (actually, it falls back to `isel` when giving dimension names with no coordinate, and I'm wondering if we shouldn't deprecate that?)
`Index.isel()` is convenient when the underlying index structure can be itself sliced (like `pandas.Index` objects), so that users don't need to do `ds.isel(...).set_xindex(...)` every time to explicitly rebuild an index after slicing the Dataset. For a kd-tree structure that may not be possible, i.e., `KDTreeIndex.isel()` would likely return `None` causing the index to be dropped in the result, so there would be no way around doing `ds.isel(...).set_xindex(...)`.
Most coordinate and data variables are still sliced via `Variable.isel()`, which doesn't involve any index. That's why you get an `IndexError` in your example. (side note: the ""index"" / ""indexing"" terminology used everywhere, for both label and integer selection, is quite confusing but I'm not sure how this could be improved).
If we want to support periodic indexing with `isel`, we would have to implement that in Xarray itself. Alternatively, it *might* be possible to add some API in `Index` so that in the case of a periodic index it would return `indxr % length` from `indxr`, which Xarray will then pass to `Variable.isel()`. I'm not sure the latter is a good idea, though. Indexes may work with arbitrary coordinates and dimensions, which would make things too complex (handling conflicts, etc.). Also, I don't know if there's other potential use cases besides periodic indexing?
@TomNicholas your experiment makes it clear that the documentation on this part (#6975) should be improved. Thanks!
","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,1372035441