home / github

Menu
  • Search all tables
  • GraphQL API

issue_comments

Table actions
  • GraphQL API for issue_comments

14 rows where issue = 1372035441 sorted by updated_at descending

✎ View and edit SQL

This data as json, CSV (advanced)

Suggested facets: reactions, created_at (date), updated_at (date)

user 5

  • TomNicholas 7
  • benbovy 4
  • dcherian 1
  • keewis 1
  • headtr1ck 1

author_association 2

  • MEMBER 13
  • COLLABORATOR 1

issue 1

  • Periodic Boundary Index · 14 ✖
id html_url issue_url node_id user created_at updated_at ▲ author_association body reactions performed_via_github_app issue
1249218631 https://github.com/pydata/xarray/issues/7031#issuecomment-1249218631 https://api.github.com/repos/pydata/xarray/issues/7031 IC_kwDOAMm_X85KdZBH benbovy 4160723 2022-09-16T10:50:10Z 2022-09-16T10:50:10Z MEMBER

In general, it seems like most (nearly all?) 1D indexing use cases can be handled by encapsulating a PandasIndex (see also https://github.com/dcherian/crsindex). So perhaps we should just recommend that and add a lot more comments to PandasIndex to make it easier to build on.

I've created a MultiPandasIndex helper class for that purpose: notebook.

I've extracted the boilerplate from @dcherian's CRSIndex and I've implemented the remaining Index API. It raised a couple of issues, notably for Index.isel which should probably return a dict[Hashable, Index] instead of Index | None (the latter is not flexible enough, i.e., when the dimensions of the meta-index are partially dropped in the selection).

The other thing I will think about is whether anything special needs to happen for 2D+ periodicity. I suspect that for integers you could just use independent 1D indexes along each dim but for slicing across the "dateline" it might get messy in higher dimensions...

Yeah I guess it will work well with independent PeriodicBoundaryIndex instances (possibly grouped in a MultiPandasIndex) for gridded data.

For multi-dimension coordinates with periodic boundaries this would probably be best handled by more specific indexes, e.g., xoak's s2point index that supports periodicity for lat/lon data (I think).

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Periodic Boundary Index 1372035441
1248876262 https://github.com/pydata/xarray/issues/7031#issuecomment-1248876262 https://api.github.com/repos/pydata/xarray/issues/7031 IC_kwDOAMm_X85KcFbm TomNicholas 35968931 2022-09-16T03:24:59Z 2022-09-16T03:31:50Z MEMBER

I think this version does something sensible for all slice cases

```python from xarray.core.indexes import ( PandasIndex, IndexSelResult, _query_slice ) from xarray.core.indexing import _expand_slice

class PeriodicBoundaryIndex(PandasIndex): """ An index representing any 1D periodic numberline.

Implementation subclasses a normal xarray PandasIndex object but intercepts indexer queries.
"""
period: float
_min: float
_max: float

__slots__ = ("index", "dim", "coord_dtype", "period", "_max", "_min")

def __init__(self, *args, period=360, **kwargs):
    super().__init__(*args, **kwargs)
    self.period = period
    self._min = self.index.min()
    self._max = self.index.max()

@classmethod
def from_variables(self, variables, options):
    obj = super().from_variables(variables, options={})
    obj.period = options.get("period", obj.period)
    return obj

def _wrap_periodically(self, label_value: float) -> float:
    return self._min + (label_value - self._max) % self.period

def _split_slice_across_boundary(self, label: slice) -> np.ndarray:
    """
    Splits a slice into two slices, one either side of the boundary,
    finds the corresponding indices, concatenates them, and returns them,
    ready to be passed to .isel().
    """
    first_slice = slice(label.start, self._max, label.step)
    second_slice = slice(self._min, label.stop, label.step)

    first_as_index_slice = _query_slice(self.index, first_slice)
    second_as_index_slice = _query_slice(self.index, second_slice)

    first_as_indices = _expand_slice(first_as_index_slice, self.index.size)
    second_as_indices = _expand_slice(second_as_index_slice, self.index.size)

    wrapped_indices = np.concatenate([first_as_indices, second_as_indices])
    return wrapped_indices

def sel(
    self, labels: dict[Any, Any], method=None, tolerance=None
) -> IndexSelResult:
    """Remaps labels outside of the indexes' range back to integer indices inside the range."""

    assert len(labels) == 1
    coord_name, label = next(iter(labels.items()))

    if isinstance(label, slice):
        start, stop, step = label.start, label.stop, label.step
        if stop < start:
            return super().sel({coord_name: []})

        assert self._min < self._max

        wrapped_start = self._wrap_periodically(label.start)
        wrapped_stop = self._wrap_periodically(label.stop)
        wrapped_label = slice(wrapped_start, wrapped_stop, step)

        if wrapped_start < wrapped_stop:
            # simple case of slice not crossing boundary
            return super().sel({coord_name: wrapped_label})
        else:  # wrapped_stop < wrapped_start:
            # nasty case of slice crossing boundary
            wrapped_indices = self._split_slice_across_boundary(wrapped_label)
            return IndexSelResult({self.dim: wrapped_indices})

    else:
        # just a scalar / array of scalars
        wrapped_label = self._wrap_periodically(label)
        return super().sel({coord_name: wrapped_label}, method=method, tolerance=tolerance)

def __repr__(self) -> str:
    return f"PeriodicBoundaryIndex(period={self.period})"

```

```python lon_coord = xr.DataArray(data=np.linspace(-180, 180, 19), dims="lon") da = xr.DataArray(data=np.sin(180*lon_coord), dims="lon", coords={"lon": lon_coord})

world = da.drop_indexes("lon").set_xindex("lon", index_cls=PeriodicBoundaryIndex, period=360) ```

```python world.sel(lon=slice(60, 120), method="nearest")

<xarray.DataArray (lon: 4)>

array([-0.71424378, -0.87270922, -0.9701637 , -0.99979417])

Coordinates:

* lon (lon) float64 60.0 80.0 100.0 120.0

```

```python world.sel(lon=slice(160, 210), method="nearest")

<xarray.DataArray (lon: 4)>

array([-0.85218366, -0.68526211, 0.68526211, 0.85218366])

Coordinates:

* lon (lon) float64 160.0 180.0 -180.0 -160.0

```

```python world.sel(lon=slice(-210, -160), method="nearest")

<xarray.DataArray (lon: 4)>

array([-0.85218366, -0.68526211, 0.68526211, 0.85218366])

Coordinates:

* lon (lon) float64 160.0 180.0 -180.0 -160.0

```

Unsure as to whether this next one counts as an "intuitive" result or not

```python world.sel(lon=slice(-210, 210), method="nearest")

<xarray.DataArray (lon: 4)>

array([-0.85218366, -0.68526211, 0.68526211, 0.85218366])

Coordinates:

* lon (lon) float64 160.0 180.0 -180.0 -160.0

```

```python world.sel(lon=slice(120, 60), method="nearest")

<xarray.DataArray (lon: 0)>

array([], dtype=float64)

Coordinates:

* lon (lon) float64

```

{
    "total_count": 1,
    "+1": 1,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Periodic Boundary Index 1372035441
1248518310 https://github.com/pydata/xarray/issues/7031#issuecomment-1248518310 https://api.github.com/repos/pydata/xarray/issues/7031 IC_kwDOAMm_X85KauCm TomNicholas 35968931 2022-09-15T19:24:26Z 2022-09-15T19:51:46Z MEMBER

Okay I think this design could work for slicing across boundaries:

```python from xarray.core.indexes import PandasIndex, IndexSelResult, _query_slice from xarray.core.indexing import _expand_slice

class PeriodicBoundaryIndex(PandasIndex): """ An index representing any 1D periodic numberline.

Implementation subclasses a normal xarray PandasIndex object but intercepts indexer queries.
"""
period: float
_min: float
_max: float

__slots__ = ("index", "dim", "coord_dtype", "period", "_max", "_min")

def __init__(self, *args, period=360, **kwargs):
    super().__init__(*args, **kwargs)
    self.period = period
    self._min = self.index.min()
    self._max = self.index.max()

@classmethod
def from_variables(self, variables, options):
    obj = super().from_variables(variables, options={})
    obj.period = options.get("period", obj.period)
    return obj

def _wrap_periodically(self, label_value: float) -> float:
    """Remaps an individual point label back to another inside the range."""
    return self._min + (label_value - self._max) % self.period

def _split_slice_across_boundary(self, label: slice) -> np.ndarray:
    """
    Splits a slice into two slices, one either side of the boundary,
    finds the corresponding indices, concatenates them, 
    and returns them ready to be passed to .isel().
    """
    first_slice = slice(label.start, self._max, label.step)
    second_slice = slice(self._min, self._wrap_periodically(label.stop), label.step)

    first_as_index_slice = _query_slice(self.index, first_slice)
    second_as_index_slice = _query_slice(self.index, second_slice)

    first_as_indices = _expand_slice(first_as_index_slice, self.index.size)
    second_as_indices = _expand_slice(second_as_index_slice, self.index.size)

    wrapped_indices = np.concatenate([first_as_indices, second_as_indices])
    return wrapped_indices

def sel(
    self, labels: dict[Any, Any], method=None, tolerance=None
) -> IndexSelResult:
    """Remaps labels outside of the indexes' range back to integer indices inside the range."""

    assert len(labels) == 1
    coord_name, label = next(iter(labels.items()))

    if isinstance(label, slice):
        # TODO enumerate all the possible cases
        if self._min < label.start < self._max and self._min < label.stop < self._max:
            # simple case of slice not crossing boundary
            wrapped_label = slice(
                self._wrap_periodically(label.start),
                self._wrap_periodically(label.stop),
            )
            return super().sel({coord_name: wrapped_label})
        elif self._min < label.start < self._max and label.start < self._max < label.stop:
            # nasty case of slice crossing boundary
            wrapped_indices = self._split_slice_across_boundary(label)
            return IndexSelResult({self.dim: wrapped_indices})
        else:
            # TODO there are many other cases to handle...
            raise NotImplementedError()
    else:
        # just a scalar
        wrapped_label = self._wrap_periodically(label)
        return super().sel({coord_name: wrapped_label}, method=method, tolerance=tolerance)

def __repr__(self) -> str:
    return f"PeriodicBoundaryIndex(period={self.period})"

```

```python world.sel(lon=slice(60, 120), method="nearest")

<xarray.DataArray (lon: 4)>

array([-0.71424378, -0.87270922, -0.9701637 , -0.99979417])

Coordinates:

* lon (lon) float64 60.0 80.0 100.0 120.0

```

This works even for slices that cross the dateline

```python world.sel(lon=slice(160, 210), method="nearest")

<xarray.DataArray (lon: 4)>

array([-0.85218366, -0.68526211, 0.68526211, 0.85218366])

Coordinates:

* lon (lon) float64 160.0 180.0 -180.0 -160.0

```

This isn't general yet, there are lots of edge cases this would fail on, but I think it shows that as long as each case is captured we always could use this approach to remap back to index values that do lie within the range? What do people think?

EDIT:

One could split it into two calls to isel and concatenate the result. Not sure if that's possible with the given interface.

I believe what I've done here is the closest thing to that that is possible with the given interface.

{
    "total_count": 1,
    "+1": 1,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Periodic Boundary Index 1372035441
1248354099 https://github.com/pydata/xarray/issues/7031#issuecomment-1248354099 https://api.github.com/repos/pydata/xarray/issues/7031 IC_kwDOAMm_X85KaF8z headtr1ck 43316012 2022-09-15T16:43:29Z 2022-09-15T16:43:29Z COLLABORATOR

Not sure how this could be handled, but probably the easiest is to raise for cases like world.sel(lon=slice(170, 190)).

One could split it into two calls to isel and concatenate the result. Not sure if that's possible with the given interface.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Periodic Boundary Index 1372035441
1248259720 https://github.com/pydata/xarray/issues/7031#issuecomment-1248259720 https://api.github.com/repos/pydata/xarray/issues/7031 IC_kwDOAMm_X85KZu6I dcherian 2448579 2022-09-15T15:26:14Z 2022-09-15T15:26:14Z MEMBER

In general, it seems like most (nearly all?) 1D indexing use cases can be handled by encapsulating a PandasIndex (see also https://github.com/dcherian/crsindex). So perhaps we should just recommend that and add a lot more comments to PandasIndex to make it easier to build on.

{
    "total_count": 1,
    "+1": 1,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Periodic Boundary Index 1372035441
1248224096 https://github.com/pydata/xarray/issues/7031#issuecomment-1248224096 https://api.github.com/repos/pydata/xarray/issues/7031 IC_kwDOAMm_X85KZmNg TomNicholas 35968931 2022-09-15T14:59:09Z 2022-09-15T14:59:09Z MEMBER

Nice @benbovy ! That seems useable already - I'll open a PR to work on it more. It's also much neater :smile:

Note: I also added period as an option

Great, I was hoping there was some functionality like that.

Not sure how this could be handled, but probably the easiest is to raise for cases like world.sel(lon=slice(170, 190)).

That would be a shame IMO, so I'll have more of think about how to handle slicing across the dateline.

We must be careful, though, as it may be harder to make changes without possibly breaking 3rd-party stuff.

It's not obvious where to call those hooks, though.

All good points.

The other thing I will think about is whether anything special needs to happen for 2D+ periodicity. I suspect that for integers you could just use independent 1D indexes along each dim but for slicing across the "dateline" it might get messy in higher dimensions...

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Periodic Boundary Index 1372035441
1247731386 https://github.com/pydata/xarray/issues/7031#issuecomment-1247731386 https://api.github.com/repos/pydata/xarray/issues/7031 IC_kwDOAMm_X85KXt66 benbovy 4160723 2022-09-15T08:03:50Z 2022-09-15T08:03:50Z MEMBER

Great @TomNicholas!

To avoid copying the body of PandasIndex.sel, couldn't you "just" do something like this?

```python class PeriodicBoundaryIndex(PandasIndex): """ An index representing any 1D periodic numberline.

Implementation subclasses a normal xarray PandasIndex object but intercepts indexer queries.
"""
period: float

def __init__(self, *args, period=360, **kwargs):
    super().__init__(*args, **kwargs)
    self.period = period

@classmethod
def from_variables(self, variables, options):
    obj = super().from_variables(variables, options={})
    obj.period = options.get("period", obj.period)
    return obj

def _wrap_periodically(self, label_value):
    return self.index.min() + (label_value - self.index.max()) % self.period

def sel(
    self, labels: dict[Any, Any], method=None, tolerance=None
) -> IndexSelResult:
    """Remaps labels outside of the indexes' range back to integer indices inside the range."""

    assert len(labels) == 1
    coord_name, label = next(iter(labels.items()))

    if isinstance(label, slice):
        wrapped_label = slice(
            self._wrap_periodically(label.start),
            self._wrap_periodically(label.stop),
        )
    else:
        wrapped_label = self._wrap_periodically(label)

    return super().sel({coord_name: wrapped_label})

```

Note: I also added period as an option, which is supported in #6971 but not yet well documented. Another way to pass options is via coordinate attributes, like in this FunctionalIndex example.

It should work in most cases I think:

```python lon_coord = xr.DataArray(data=np.linspace(-180, 180, 19), dims="lon") da = xr.DataArray(data=np.random.randn(19), dims="lon", coords={"lon": lon_coord})

note the period set here

world = da.drop_indexes("lon").set_xindex("lon", index_cls=PeriodicBoundaryIndex, period=360) ```

```python world.sel(lon=200, method="nearest")

<xarray.DataArray ()>

array(-0.86583185)

Coordinates:

lon float64 -160.0

world.sel(lon=[200, 200], method="nearest")

<xarray.DataArray (lon: 2)>

array([-0.86583185, -0.86583185])

Coordinates:

* lon (lon) float64 -160.0 -160.0

world.sel(lon=slice(180, 200), method="nearest")

<xarray.DataArray (lon: 2)>

array([-1.59829997, -0.86583185])

Coordinates:

* lon (lon) float64 -180.0 -160.0

```

There's likely more things to do for slices as you point out. I don't think either that it's possible to pass two slices to isel. Not sure how this could be handled, but probably the easiest is to raise for cases like world.sel(lon=slice(170, 190)).

If we really need more flexibility in sel without copying the whole body of PandasIndex.sel, we could indeed refactor PandasIndex to allow more customization in subclasses. We must be careful, though, as it may be harder to make changes without possibly breaking 3rd-party stuff.

Or like you suggest we could define some _pre_process / _post_process hooks. It's not obvious where to call those hooks, though. Before or after converting from/to Variable or DataArray? Before or after checking for slices? array or scalar? The ideal place may change from one index to another.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Periodic Boundary Index 1372035441
1247351003 https://github.com/pydata/xarray/issues/7031#issuecomment-1247351003 https://api.github.com/repos/pydata/xarray/issues/7031 IC_kwDOAMm_X85KWRDb TomNicholas 35968931 2022-09-14T22:10:01Z 2022-09-14T22:15:26Z MEMBER

I had another go and now I have this (the .sel method is just copied from PandasIndex.sel with minor changes):

```python from xarray.core.indexes import ( PandasIndex, is_scalar, as_scalar, get_indexer_nd, IndexSelResult, _query_slice, is_dict_like, normalize_label, )

class PeriodicBoundaryIndex(PandasIndex): """ An index representing any 1D periodic numberline.

Implementation subclasses a normal xarray PandasIndex object but intercepts indexer queries.
"""
period: float

def __init__(self, *args, **kwargs):
    super().__init__(*args, **kwargs)
    self.period = 360  # TODO work out where this input should be passed in instead of hard-coding

def _wrap_periodically(self, label_value):
    return self.index.min() + (label_value - self.index.max()) % self.period

def sel(
    self, labels: dict[Any, Any], method=None, tolerance=None
) -> IndexSelResult:
    """Remaps labels outside of the indexes' range back to integer indices inside the range."""

    print("sel called")

    from xarray import DataArray
    from xarray import Variable

    if method is not None and not isinstance(method, str):
        raise TypeError("``method`` must be a string")

    assert len(labels) == 1
    coord_name, label = next(iter(labels.items()))

    if isinstance(label, slice):
        print(label)
        indexer = _query_slice(self.index, label, coord_name, method, tolerance)
        print(indexer)
    elif is_dict_like(label):
        raise ValueError(
            "cannot use a dict-like object for selection on "
            "a dimension that does not have a MultiIndex"
        )
    else:
        label_array = normalize_label(label, dtype=self.coord_dtype)
        if label_array.ndim == 0:
            label_array = self._wrap_periodically(label_array)
            label_value = as_scalar(label_array)
            if isinstance(self.index, pd.CategoricalIndex):
                if method is not None:
                    raise ValueError(
                        "'method' is not supported when indexing using a CategoricalIndex."
                    )
                if tolerance is not None:
                    raise ValueError(
                        "'tolerance' is not supported when indexing using a CategoricalIndex."
                    )
                indexer = self.index.get_loc(label_value)
            else:
                if method is not None:
                    print(label_array)
                    indexer = get_indexer_nd(
                        self.index, label_array, method, tolerance
                    )
                    if np.any(indexer < 0):
                        raise KeyError(
                            f"not all values found in index {coord_name!r}"
                        )
                else:
                    try:
                        print(label_value)
                        indexer = self.index.get_loc(label_value)
                    except KeyError as e:
                        raise KeyError(
                            f"not all values found in index {coord_name!r}. "
                            "Try setting the `method` keyword argument (example: method='nearest')."
                        ) from e

        elif label_array.dtype.kind == "b":
            indexer = label_array
        else:
            indexer = get_indexer_nd(self.index, label_array, method, tolerance)
            if np.any(indexer < 0):
                raise KeyError(f"not all values found in index {coord_name!r}")

        # attach dimension names and/or coordinates to positional indexer
        if isinstance(label, Variable):
            indexer = Variable(label.dims, indexer)
        elif isinstance(label, DataArray):
            indexer = DataArray(indexer, coords=label._coords, dims=label.dims)

    return IndexSelResult({self.dim: indexer})

def isel(
    self, indexers: Mapping[Any, Union[int, slice, np.ndarray, Variable]]
) -> Union["PeriodicBoundaryIndex", None]:

    print("isel called")
    return super().isel(indexers=indexers)

```

This works for integer indexing with sel!

python lon_coord = xr.DataArray(data=np.linspace(-180, 180, 19), dims="lon") da = xr.DataArray(data=np.random.randn(19), dims="lon", coords={"lon": lon_coord}) <xarray.DataArray (lon: 19)> array([-0.67423202, 0.14173693, -0.51427002, 1.25764101, 0.23863066, 0.05703135, -0.65350384, -0.74356356, 0.98524252, -0.94975665, 0.63314842, -0.7144752 , 0.47282375, 0.31555171, -0.13179154, -1.10255267, 0.88180541, 1.28461459, 1.61273741]) Coordinates: * lon (lon) float64 -180.0 -160.0 -140.0 -120.0 ... 140.0 160.0 180.0

```python world = da.drop_indexes("lon").set_xindex("lon", index_cls=PeriodicBoundaryIndex)

world.sel(lon=200, method="nearest") <xarray.DataArray ()> array(0.14173693) Coordinates: lon float64 -160.0 ``` Yay! :champagne:

Q: Best way to do this for slicing?

I want this to work python world.sel(lon=slice(170, 190)) Internally that means PeriodicBoundaryIndex.sel has to return an indexer that points to values at both the start and end of the array data. I'm not sure what the best way of doing this is. Originally I imagined returning two slices but I don't think that's a valid argument to Dataset.isel().

So I guess I have to turn my slice into a list of specific integer positions and pass that to .isel()? How do I do that? Is that going to be inefficient somehow?

I guess I also want to reorder the result before returning it, otherwise the two sides of the dateline won't be stitched together in the right order...

Q: Where should I pass in period?

If I want the period of the PeriodicBoundaryIndex to be a general parameter, independent of the data in the array, the attributes, or the values of the index labels, where would be the most sensible place to pass this in? .set_indexes only accepts a class, not an instance, and I can't use .from_variables as it can't be deduced from the variables in general, so where can I feed it in?

Q: How to avoid just copying all of PandasIndex.sel's implementation?

I find myself copying the entire implementation of PandasIndex.sel just to insert 1 or two lines in predictable places.

Also pointing me to looking at the implementation of PandasIndex is going to lead to me using lots of private functions from xarray.indexes, because I have to import them in order to copy-paste code from PandasIndex.

I wonder if these problems could be ameliorated by providing public entry methods in the Index superclass? I'm thinking about how in anytree (which I used to make the first prototype of datatree) there are these methods that do nothing by default but are intended to be overridden to insert functionality at key steps. The pattern is basically this:

```python

library code

class NodeMixin: """Inherit from this to create your own TreeNode class with parent and children"""

def _pre_detach_children(self, children):
    """Method call before detaching `children`."""
    pass

def _post_detach_children(self, children):
    """Method call after detaching `children`."""
    pass

def _pre_attach_children(self, children):
    """Method call before attaching `children`."""
    pass

def _post_attach_children(self, children):
    """Method call after attaching `children`."""
    pass

user code

class MyHappyTreeNode(NodeMixin): def _pre_attach_children(self, children): """Celebrates the gift of children""" print("A child is born!") ```

What if we put similar methods on the PandasIndex superclass? Like

```python class PandasIndex: def _post_process_label_value(self, label_value: float) -> float: """Method call after determining scalar label value.""" return label_value

def sel(
    self, labels: dict[Any, Any], method=None, tolerance=None
) -> IndexSelResult:
    # rest of the function as before
    ...

    if isinstance(label, slice):
        indexer = _query_slice(self.index, label, coord_name, method, tolerance)
    elif is_dict_like(label):
        raise ValueError(
            "cannot use a dict-like object for selection on "
            "a dimension that does not have a MultiIndex"
        )
    else:
        label_array = normalize_label(label, dtype=self.coord_dtype)
        if label_array.ndim == 0:
            label_value = as_scalar(label_array)
            label_value = self._post_process_label_value(label_value)  # new bit
            ...
    # rest of the function as before

```

Then in my case I would not have had to copy so much of the implementation, I could have simply done

```python class PeriodicBoundaryIndex(PandasIndex): period: float

def __init__(self, *args, **kwargs):
    super().__init__(*args, **kwargs)
    self.period = 360  # TODO work out where this input should be passed in instead of hard-coding

def _wrap_periodically(self, label_value):
    return self.index.min() + (label_value - self.index.max()) % self.period

def _post_process_label_value(self, label_value):
    return self._wrap_periodically(label_value)

```

Maybe this is a bad idea / doesn't make sense but I thought I would suggest it anyway.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Periodic Boundary Index 1372035441
1246913319 https://github.com/pydata/xarray/issues/7031#issuecomment-1246913319 https://api.github.com/repos/pydata/xarray/issues/7031 IC_kwDOAMm_X85KUmMn TomNicholas 35968931 2022-09-14T15:09:16Z 2022-09-14T20:35:57Z MEMBER

Thank you for the clear explanation @benbovy! @keewis you were right, as ever, but now I understand why!)

There's a big difference now between isel and sel:

So Dataset.sel calls Index.sel, but can also sometimes call Dataset.isel. But Dataset.isel does not call Index.isel, nor Index.sel. Yes that is definitely confusing! My understanding from reading the docs was that every Dataset.meth calls the corresponding Index.meth.

I'm wondering if we shouldn't deprecate that?

The situation would be a bit clearer if we did deprecate that, but I imagine that would come with a significant backwards compatibility cost.

I think if possible each method on both Dataset and Index should have a note in the docstring about which methods it calls or is called by, as well as major gotchas pointed out in the custom indexes docs version via warnings/notes.

If we want to support periodic indexing with isel, we would have to implement that in Xarray itself.

This would be cool, but doesn't really seem worth it, at least for an initial experiment.

Presumably I can still make a periodic index that works when doing label-based indexing? If my Dataset had a non-dimension coordinate "lat" backed by a periodic index that was a function of a dimension "i", then it would be a bit clearer why indexing with lat would be periodic but indexing with i would not be.

On Wed, 14 Sep 2022, 04:08 Benoit Bovy, @.***> wrote:

tl;dr: Xarray Index currently supports implementing periodic indexing for label-based indexing but not for location-based (integer) indexing.

There's a big difference now between isel and sel:

  • Dataset.isel() accepts dimension names only
  • Dataset.sel() accepts coordinate names (actually, it falls back to isel when giving dimension names with no coordinate, and I'm wondering if we shouldn't deprecate that?)

Index.isel() is convenient when the underlying index structure can be itself sliced (like pandas.Index objects), so that users don't need to do ds.isel(...).set_xindex(...) every time to explicitly rebuild an index after slicing the Dataset. For a kd-tree structure that may not be possible, i.e., KDTreeIndex.isel() would likely return None causing the index to be dropped in the result, so there would be no way around doing ds.isel(...).set_xindex(...).

Most coordinate and data variables are still sliced via Variable.isel(), which doesn't involve any index. That's why you get an IndexError in your example. (side note: the "index" / "indexing" terminology used everywhere, for both label and integer selection, is quite confusing but I'm not sure how this could be improved).

If we want to support periodic indexing with isel, we would have to implement that in Xarray itself. Alternatively, it might be possible to add some API in Index so that in the case of a periodic index it would return indxr % length from indxr, which Xarray will then pass to Variable.isel(). I'm not sure the latter is a good idea, though. Indexes may work with arbitrary coordinates and dimensions, which would make things too complex (handling conflicts, etc.). Also, I don't know if there's other potential use cases besides periodic indexing?

@TomNicholas https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_TomNicholas&d=DwMCaQ&c=009klHSCxuh5AI1vNQzSO0KGjl4nbi2Q0M1QLJX9BeE&r=qdISi9HqjazmE0DcySuXts3OlnplnLfKjH4hpzAV0xo&m=6Eurln2plZyGChJ5a5erUIWLdRQYncs9JCJEfdBqaTLUzpZQwj4Om1ykGqgvhJCV&s=qhcbnfbmpOcf9UZVZE7cvJgT6_0ywxc0ZQSKupKZHO8&e= your experiment makes it clear that the documentation on this part (#6975 https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_pydata_xarray_pull_6975&d=DwMCaQ&c=009klHSCxuh5AI1vNQzSO0KGjl4nbi2Q0M1QLJX9BeE&r=qdISi9HqjazmE0DcySuXts3OlnplnLfKjH4hpzAV0xo&m=6Eurln2plZyGChJ5a5erUIWLdRQYncs9JCJEfdBqaTLUzpZQwj4Om1ykGqgvhJCV&s=fqKipQbRyoJT6rZ_FvmY1pMv9dZTMo3l8q_ilmk7wbQ&e=) should be improved. Thanks!

— Reply to this email directly, view it on GitHub https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_pydata_xarray_issues_7031-23issuecomment-2D1246398771&d=DwMCaQ&c=009klHSCxuh5AI1vNQzSO0KGjl4nbi2Q0M1QLJX9BeE&r=qdISi9HqjazmE0DcySuXts3OlnplnLfKjH4hpzAV0xo&m=6Eurln2plZyGChJ5a5erUIWLdRQYncs9JCJEfdBqaTLUzpZQwj4Om1ykGqgvhJCV&s=vjffKcfZjIy1AQDf7hPZOGK3iAtaYRnWwUBfJudWTKk&e=, or unsubscribe https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_notifications_unsubscribe-2Dauth_AISNPI4MNH3YETGDVW6UKW3V6GBYNANCNFSM6AAAAAAQL2WYZI&d=DwMCaQ&c=009klHSCxuh5AI1vNQzSO0KGjl4nbi2Q0M1QLJX9BeE&r=qdISi9HqjazmE0DcySuXts3OlnplnLfKjH4hpzAV0xo&m=6Eurln2plZyGChJ5a5erUIWLdRQYncs9JCJEfdBqaTLUzpZQwj4Om1ykGqgvhJCV&s=IDHydhJJ2sbP2i6aeTeNqjGF0fkXx9UGj3GV-4zVQTM&e= . You are receiving this because you were mentioned.Message ID: @.***>

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Periodic Boundary Index 1372035441
1247058071 https://github.com/pydata/xarray/issues/7031#issuecomment-1247058071 https://api.github.com/repos/pydata/xarray/issues/7031 IC_kwDOAMm_X85KVJiX TomNicholas 35968931 2022-09-14T17:02:46Z 2022-09-14T17:02:46Z MEMBER

That's very helpful, and should definitely go in the docs somewhere!

On Wed, 14 Sep 2022, 11:31 Benoit Bovy, @.***> wrote:

My understanding from reading the docs was that every Dataset.meth calls the corresponding Index.meth.

Yes that's indeed what I've written in #6975 https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_pydata_xarray_pull_6975&d=DwMCaQ&c=009klHSCxuh5AI1vNQzSO0KGjl4nbi2Q0M1QLJX9BeE&r=qdISi9HqjazmE0DcySuXts3OlnplnLfKjH4hpzAV0xo&m=p6AEVnna74Yl7ERqNIUh2N0qdwITIr0EYT0apJ7Hbgk0KdQNUOudgu8tMxoLiNLW&s=aId1i1PHZl_Co87NrwWTHiu6metAHOBNNVBaBQt1LAo&e= and I realize now that this is confusing, especially for isel.

So Dataset.sel calls Index.sel, but can also sometimes call Dataset.isel. But Dataset.isel does not call Index.isel, nor Index.sel.

So we can describe the implementation of Dataset.sel() has a two-step procedure:

1.

remap the input dictionary {coord_name: label_values} to a dictionary {dimension_name: int_positions}. - This is done via calls to Index.sel() for each unique index found in Dataset.xindexes 2.

pass the the dictionary {dimension_name: int_positions} to Dataset.isel(). - Dataset.isel() will itself call Variable.isel() for each variable in Dataset.variables and Index.isel() for each unique index in Dataset.xindexes

This omits a few implementation details (special cases for multi-index), but that's basically how it works.

I think it would help if such "how label-based selection works in Xarray" section was added in the "Xarray internals" documentation, along with other "how it works" section for, e.g., alignment.

— Reply to this email directly, view it on GitHub https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_pydata_xarray_issues_7031-23issuecomment-2D1246944137&d=DwMCaQ&c=009klHSCxuh5AI1vNQzSO0KGjl4nbi2Q0M1QLJX9BeE&r=qdISi9HqjazmE0DcySuXts3OlnplnLfKjH4hpzAV0xo&m=p6AEVnna74Yl7ERqNIUh2N0qdwITIr0EYT0apJ7Hbgk0KdQNUOudgu8tMxoLiNLW&s=Ks6PoxVcaVr7k89uOVteNHQ4LJ6gUn1ONjAA_UVC3PU&e=, or unsubscribe https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_notifications_unsubscribe-2Dauth_AISNPIYBB6EDBNOL3ORUFKLV6HVT3ANCNFSM6AAAAAAQL2WYZI&d=DwMCaQ&c=009klHSCxuh5AI1vNQzSO0KGjl4nbi2Q0M1QLJX9BeE&r=qdISi9HqjazmE0DcySuXts3OlnplnLfKjH4hpzAV0xo&m=p6AEVnna74Yl7ERqNIUh2N0qdwITIr0EYT0apJ7Hbgk0KdQNUOudgu8tMxoLiNLW&s=ojUYzNS_we7kqIG-mT0SDi_iY4nS4fri483cGbjgHwk&e= . You are receiving this because you were mentioned.Message ID: @.***>

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Periodic Boundary Index 1372035441
1246944137 https://github.com/pydata/xarray/issues/7031#issuecomment-1246944137 https://api.github.com/repos/pydata/xarray/issues/7031 IC_kwDOAMm_X85KUtuJ benbovy 4160723 2022-09-14T15:30:59Z 2022-09-14T16:31:29Z MEMBER

My understanding from reading the docs was that every Dataset.meth calls the corresponding Index.meth.

Yes that's indeed what I've written in #6975 and I realize now that this is confusing, especially for isel.

So Dataset.sel calls Index.sel, but can also sometimes call Dataset.isel. But Dataset.isel does not call Index.isel, nor Index.sel.

So we can describe the implementation of Dataset.sel() as a two-step procedure:

  1. remap the input dictionary {coord_name: label_values} to a dictionary {dimension_name: int_positions}.

    • This is done via dispatching the input dictionary and calling Index.sel() for each of the relevant indexes found in Dataset.xindexes, and then merging all the returned results into a single output dictionary.
  2. pass the the dictionary {dimension_name: int_positions} to Dataset.isel().

    • Dataset.isel() will dispatch this input dictionary and call Variable.isel() for each variable in Dataset.variables and Index.isel() for each unique index in Dataset.xindexes.

This omits a few implementation details (special cases for multi-index), but that's basically how it works.

I think it would help if such "how label-based selection works in Xarray" high-level description was added somewhere in the "Xarray internals" documentation, along with other "how it works" sections for, e.g., alignment.

{
    "total_count": 1,
    "+1": 1,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Periodic Boundary Index 1372035441
1246398771 https://github.com/pydata/xarray/issues/7031#issuecomment-1246398771 https://api.github.com/repos/pydata/xarray/issues/7031 IC_kwDOAMm_X85KSokz benbovy 4160723 2022-09-14T08:08:26Z 2022-09-14T08:08:26Z MEMBER

tl;dr: Xarray Index currently supports implementing periodic indexing for label-based indexing but not for location-based (integer) indexing.

There's a big difference now between isel and sel:

  • Dataset.isel() accepts dimension names only
  • Dataset.sel() accepts coordinate names (actually, it falls back to isel when giving dimension names with no coordinate, and I'm wondering if we shouldn't deprecate that?)

Index.isel() is convenient when the underlying index structure can be itself sliced (like pandas.Index objects), so that users don't need to do ds.isel(...).set_xindex(...) every time to explicitly rebuild an index after slicing the Dataset. For a kd-tree structure that may not be possible, i.e., KDTreeIndex.isel() would likely return None causing the index to be dropped in the result, so there would be no way around doing ds.isel(...).set_xindex(...).

Most coordinate and data variables are still sliced via Variable.isel(), which doesn't involve any index. That's why you get an IndexError in your example. (side note: the "index" / "indexing" terminology used everywhere, for both label and integer selection, is quite confusing but I'm not sure how this could be improved).

If we want to support periodic indexing with isel, we would have to implement that in Xarray itself. Alternatively, it might be possible to add some API in Index so that in the case of a periodic index it would return indxr % length from indxr, which Xarray will then pass to Variable.isel(). I'm not sure the latter is a good idea, though. Indexes may work with arbitrary coordinates and dimensions, which would make things too complex (handling conflicts, etc.). Also, I don't know if there's other potential use cases besides periodic indexing?

@TomNicholas your experiment makes it clear that the documentation on this part (#6975) should be improved. Thanks!

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Periodic Boundary Index 1372035441
1246023925 https://github.com/pydata/xarray/issues/7031#issuecomment-1246023925 https://api.github.com/repos/pydata/xarray/issues/7031 IC_kwDOAMm_X85KRND1 TomNicholas 35968931 2022-09-13T22:48:48Z 2022-09-14T03:03:34Z MEMBER

I was thinking I want it to work periodically even in integer index space too - surely I need isel for that? I might be misunderstanding something fundamental here though.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Periodic Boundary Index 1372035441
1246008530 https://github.com/pydata/xarray/issues/7031#issuecomment-1246008530 https://api.github.com/repos/pydata/xarray/issues/7031 IC_kwDOAMm_X85KRJTS keewis 14808389 2022-09-13T22:23:01Z 2022-09-13T22:26:16Z MEMBER

shouldn't you implement (and call) sel instead of isel if you're working in coordinate space?

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Periodic Boundary Index 1372035441

Advanced export

JSON shape: default, array, newline-delimited, object

CSV options:

CREATE TABLE [issue_comments] (
   [html_url] TEXT,
   [issue_url] TEXT,
   [id] INTEGER PRIMARY KEY,
   [node_id] TEXT,
   [user] INTEGER REFERENCES [users]([id]),
   [created_at] TEXT,
   [updated_at] TEXT,
   [author_association] TEXT,
   [body] TEXT,
   [reactions] TEXT,
   [performed_via_github_app] TEXT,
   [issue] INTEGER REFERENCES [issues]([id])
);
CREATE INDEX [idx_issue_comments_issue]
    ON [issue_comments] ([issue]);
CREATE INDEX [idx_issue_comments_user]
    ON [issue_comments] ([user]);
Powered by Datasette · Queries took 14.652ms · About: xarray-datasette