home / github / issues

Menu
  • Search all tables
  • GraphQL API

issues: 893415955

This data as json

id node_id number title user state locked assignee milestone comments created_at updated_at closed_at author_association active_lock_reason draft pull_request body reactions performed_via_github_app state_reason repo type
893415955 MDExOlB1bGxSZXF1ZXN0NjQ1OTMzODI3 5322 Internal refactor of label-based data selection 4160723 closed 0     1 2021-05-17T14:52:49Z 2022-03-29T07:10:07Z 2021-06-08T09:35:54Z MEMBER   0 pydata/xarray/pulls/5322

Xarray label-based data selection now relies on a newly added xarray.Index.query(self, labels: Dict[Hashable, Any]) -> Tuple[Any, Optional[None, Index]] method where:

  • labels is a always a dictionary with coordinate name(s) as key(s) and the corresponding selection label(s) as values
  • When calling .sel with some coordinate(s)/label(s) pairs, those are first grouped by index so that only the relevant pairs are passed to an Index.query
  • the returned tuple contains the positional indexers and (optionally) a new index object

For a simple pd.Index, labels always corresponds to a 1-item dictionary like {'coord_name': label_values}, which is not very useful in this case, but this format is useful for pd.MultiIndex and will likely be for other, custom indexes.

Moving the label->positional indexer conversion logic into PandasIndex.query(), I've tried to separate pd.Index vs pd.MultiIndex concerns by adding a new PandasMultiIndex wrapper class (it will probably be useful for other things as well) and refactor the complex logic that was implemented in convert_label_indexer. Hopefully it is a bit clearer now.

Working towards a more flexible/generic system, we still need to figure out how to:

  • pass index query extra arguments like method and tolerance for pd.Index but in a more generic way
  • handle several positional indexers over multiple dimensions possibly returned by a custom "meta-index" (e.g., staggered grid index)
  • handle the case of positional indexers returned from querying >1 indexes along the same dimension (e.g., multiple coordinates along x with a simple pd.Index)
  • pandas indexes don't need information like the names or shapes of their corresponding coordinate(s) to perform label-based selection, but this kind of information will probably be needed for other indexes (we actually need it for advanced point-wise selection using tree-based indexes in xoak).

This could be done in follow-up PRs..

Side note: I've initially tried to return from xindexes items for multi-index levels as well (not only index dimensions), but it's probably wiser to save this for later (when we'll tackle the multi-index virtual coordinate refactoring) as there are many places in Xarray where this is clearly not expected.

Happy to hear your thoughts @pydata/xarray.

{
    "url": "https://api.github.com/repos/pydata/xarray/issues/5322/reactions",
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
    13221727 pull

Links from other tables

  • 1 row from issues_id in issues_labels
  • 1 row from issue in issue_comments
Powered by Datasette · Queries took 0.561ms · About: xarray-datasette