github: issue_comments: 4 rows where issue = 1372035441 and user = 4160723 sorted by updated

4 rows where issue = 1372035441 and user = 4160723 sorted by updated_at descending

Search:

descending

id	html_url	issue_url	node_id	user	created_at	updated_at ▲	author_association	body	reactions	issue
1249218631	https://github.com/pydata/xarray/issues/7031#issuecomment-1249218631	https://api.github.com/repos/pydata/xarray/issues/7031	IC_kwDOAMm_X85KdZBH	benbovy 4160723	2022-09-16T10:50:10Z	2022-09-16T10:50:10Z	MEMBER	In general, it seems like most (nearly all?) 1D indexing use cases can be handled by encapsulating a PandasIndex (see also https://github.com/dcherian/crsindex). So perhaps we should just recommend that and add a lot more comments to PandasIndex to make it easier to build on. I've created a `MultiPandasIndex` helper class for that purpose: notebook. I've extracted the boilerplate from @dcherian's `CRSIndex` and I've implemented the remaining Index API. It raised a couple of issues, notably for `Index.isel` which should probably return a `dict[Hashable, Index]` instead of `Index \| None` (the latter is not flexible enough, i.e., when the dimensions of the meta-index are partially dropped in the selection). The other thing I will think about is whether anything special needs to happen for 2D+ periodicity. I suspect that for integers you could just use independent 1D indexes along each dim but for slicing across the "dateline" it might get messy in higher dimensions... Yeah I guess it will work well with independent `PeriodicBoundaryIndex` instances (possibly grouped in a `MultiPandasIndex`) for gridded data. For multi-dimension coordinates with periodic boundaries this would probably be best handled by more specific indexes, e.g., xoak's s2point index that supports periodicity for lat/lon data (I think).	{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	Periodic Boundary Index 1372035441
1247731386	https://github.com/pydata/xarray/issues/7031#issuecomment-1247731386	https://api.github.com/repos/pydata/xarray/issues/7031	IC_kwDOAMm_X85KXt66	benbovy 4160723	2022-09-15T08:03:50Z	2022-09-15T08:03:50Z	MEMBER	Great @TomNicholas! To avoid copying the body of `PandasIndex.sel`, couldn't you "just" do something like this? ```python class PeriodicBoundaryIndex(PandasIndex): """ An index representing any 1D periodic numberline. Implementation subclasses a normal xarray PandasIndex object but intercepts indexer queries. """ period: float def __init__(self, args, period=360, kwargs): super().__init__(args, *kwargs) self.period = period @classmethod def from_variables(self, variables, options): obj = super().from_variables(variables, options={}) obj.period = options.get("period", obj.period) return obj def _wrap_periodically(self, label_value): return self.index.min() + (label_value - self.index.max()) % self.period def sel( self, labels: dict[Any, Any], method=None, tolerance=None ) -> IndexSelResult: """Remaps labels outside of the indexes' range back to integer indices inside the range.""" assert len(labels) == 1 coord_name, label = next(iter(labels.items())) if isinstance(label, slice): wrapped_label = slice( self._wrap_periodically(label.start), self._wrap_periodically(label.stop), ) else: wrapped_label = self._wrap_periodically(label) return super().sel({coord_name: wrapped_label}) ``` Note: I also added `period` as an option, which is supported in #6971 but not yet well documented. Another way to pass options is via coordinate attributes, like in this FunctionalIndex example. It should work in most cases I think: ```python lon_coord = xr.DataArray(data=np.linspace(-180, 180, 19), dims="lon") da = xr.DataArray(data=np.random.randn(19), dims="lon", coords={"lon": lon_coord}) note the period set here world = da.drop_indexes("lon").set_xindex("lon", index_cls=PeriodicBoundaryIndex, period=360) ``` ```python world.sel(lon=200, method="nearest") <xarray.DataArray ()> array(-0.86583185) Coordinates: lon float64 -160.0 world.sel(lon=[200, 200], method="nearest") <xarray.DataArray (lon: 2)> array([-0.86583185, -0.86583185]) Coordinates: lon (lon) float64 -160.0 -160.0 world.sel(lon=slice(180, 200), method="nearest") <xarray.DataArray (lon: 2)> array([-1.59829997, -0.86583185]) Coordinates: * lon (lon) float64 -180.0 -160.0 ``` There's likely more things to do for slices as you point out. I don't think either that it's possible to pass two slices to `isel`. Not sure how this could be handled, but probably the easiest is to raise for cases like `world.sel(lon=slice(170, 190))`. If we really need more flexibility in `sel` without copying the whole body of `PandasIndex.sel`, we could indeed refactor `PandasIndex` to allow more customization in subclasses. We must be careful, though, as it may be harder to make changes without possibly breaking 3rd-party stuff. Or like you suggest we could define some `_pre_process` / `_post_process` hooks. It's not obvious where to call those hooks, though. Before or after converting from/to Variable or DataArray? Before or after checking for slices? array or scalar? The ideal place may change from one index to another.	{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	Periodic Boundary Index 1372035441
1246944137	https://github.com/pydata/xarray/issues/7031#issuecomment-1246944137	https://api.github.com/repos/pydata/xarray/issues/7031	IC_kwDOAMm_X85KUtuJ	benbovy 4160723	2022-09-14T15:30:59Z	2022-09-14T16:31:29Z	MEMBER	My understanding from reading the docs was that every `Dataset.meth` calls the corresponding `Index.meth`. Yes that's indeed what I've written in #6975 and I realize now that this is confusing, especially for `isel`. So `Dataset.sel` calls `Index.sel`, but can also sometimes call `Dataset.isel`. But `Dataset.isel` does not call `Index.isel`, nor `Index.sel`. So we can describe the implementation of `Dataset.sel()` as a two-step procedure: remap the input dictionary `{coord_name: label_values}` to a dictionary `{dimension_name: int_positions}`. This is done via dispatching the input dictionary and calling `Index.sel()` for each of the relevant indexes found in `Dataset.xindexes`, and then merging all the returned results into a single output dictionary. pass the the dictionary `{dimension_name: int_positions}` to `Dataset.isel()`. `Dataset.isel()` will dispatch this input dictionary and call `Variable.isel()` for each variable in `Dataset.variables` and `Index.isel()` for each unique index in `Dataset.xindexes`. This omits a few implementation details (special cases for multi-index), but that's basically how it works. I think it would help if such "how label-based selection works in Xarray" high-level description was added somewhere in the "Xarray internals" documentation, along with other "how it works" sections for, e.g., alignment.	{ "total_count": 1, "+1": 1, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	Periodic Boundary Index 1372035441
1246398771	https://github.com/pydata/xarray/issues/7031#issuecomment-1246398771	https://api.github.com/repos/pydata/xarray/issues/7031	IC_kwDOAMm_X85KSokz	benbovy 4160723	2022-09-14T08:08:26Z	2022-09-14T08:08:26Z	MEMBER	tl;dr: Xarray `Index` currently supports implementing periodic indexing for label-based indexing but not for location-based (integer) indexing. There's a big difference now between `isel` and `sel`: `Dataset.isel()` accepts dimension names only `Dataset.sel()` accepts coordinate names (actually, it falls back to `isel` when giving dimension names with no coordinate, and I'm wondering if we shouldn't deprecate that?) `Index.isel()` is convenient when the underlying index structure can be itself sliced (like `pandas.Index` objects), so that users don't need to do `ds.isel(...).set_xindex(...)` every time to explicitly rebuild an index after slicing the Dataset. For a kd-tree structure that may not be possible, i.e., `KDTreeIndex.isel()` would likely return `None` causing the index to be dropped in the result, so there would be no way around doing `ds.isel(...).set_xindex(...)`. Most coordinate and data variables are still sliced via `Variable.isel()`, which doesn't involve any index. That's why you get an `IndexError` in your example. (side note: the "index" / "indexing" terminology used everywhere, for both label and integer selection, is quite confusing but I'm not sure how this could be improved). If we want to support periodic indexing with `isel`, we would have to implement that in Xarray itself. Alternatively, it might be possible to add some API in `Index` so that in the case of a periodic index it would return `indxr % length` from `indxr`, which Xarray will then pass to `Variable.isel()`. I'm not sure the latter is a good idea, though. Indexes may work with arbitrary coordinates and dimensions, which would make things too complex (handling conflicts, etc.). Also, I don't know if there's other potential use cases besides periodic indexing? @TomNicholas your experiment makes it clear that the documentation on this part (#6975) should be improved. Thanks!	{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	Periodic Boundary Index 1372035441

Advanced export

JSON shape: default, array, newline-delimited, object

CREATE TABLE [issue_comments] (
   [html_url] TEXT,
   [issue_url] TEXT,
   [id] INTEGER PRIMARY KEY,
   [node_id] TEXT,
   [user] INTEGER REFERENCES [users]([id]),
   [created_at] TEXT,
   [updated_at] TEXT,
   [author_association] TEXT,
   [body] TEXT,
   [reactions] TEXT,
   [performed_via_github_app] TEXT,
   [issue] INTEGER REFERENCES [issues]([id])
);
CREATE INDEX [idx_issue_comments_issue]
    ON [issue_comments] ([issue]);
CREATE INDEX [idx_issue_comments_user]
    ON [issue_comments] ([user]);

issue_comments

4 rows where issue = 1372035441 and user = 4160723 sorted by updated_at descending

note the period set here

<xarray.DataArray ()>

array(-0.86583185)

Coordinates:

lon float64 -160.0

<xarray.DataArray (lon: 2)>

array([-0.86583185, -0.86583185])

Coordinates:

* lon (lon) float64 -160.0 -160.0

<xarray.DataArray (lon: 2)>

array([-1.59829997, -0.86583185])

Coordinates:

* lon (lon) float64 -180.0 -160.0

Advanced export