home / github / issues

Menu
  • GraphQL API
  • Search all tables

issues: 1874412700

This data as json

id node_id number title user state locked assignee milestone comments created_at updated_at closed_at author_association active_lock_reason draft pull_request body reactions performed_via_github_app state_reason repo type
1874412700 PR_kwDOAMm_X85ZLe24 8124 More flexible index variables 4160723 open 0     0 2023-08-30T21:45:12Z 2023-08-31T16:02:20Z   MEMBER   1 pydata/xarray/pulls/8124
  • [ ] Closes #xxxx
  • [ ] Tests added
  • [ ] User visible changes (including notable bug fixes) are documented in whats-new.rst
  • [ ] New functions/methods are listed in api.rst

The goal of this PR is to provide a more general solution to indexed coordinate variables, i.e., support arbitrary dimensions and/or duck arrays for those variables while at the same time prevent them from being updated in a way that would invalidate their index.

This would solve problems like the one mentioned here: https://github.com/pydata/xarray/issues/1650#issuecomment-1697237429

@shoyer I've tried to implement what you have suggested in https://github.com/pydata/xarray/pull/4979#discussion_r589798510. It would be nice indeed if eventually we could get rid of IndexVariable. It won't be easy to deprecate it until we finish the index refactor (i.e., all methods listed in #6293), though. Also, I didn't find an easy way to refactor that class as it has been designed too closely around a 1-d variable backed by a pandas.Index.

So the approach implemented in this PR is to keep using IndexVariable for PandasIndex until we can deprecate / remove it later, and for the other cases use Variable with data wrapped in a custom IndexedCoordinateArray object.

The latter solution (wrapper) doesn't always work nicely, though. For example, several methods of Variable expect that self._data directly returns a duck array (e.g., a dask array or a chunked duck array). A wrapped duck array will result in unexpected behavior there. We could probably add some checks / indirection or extend the wrapper API... But I wonder if there wouldn't be a more elegant approach?

More generally, which operations should we allow / forbid / skip for an indexed coordinate variable?

  • Set array items in-place? Do not allow.
  • Replace data? Do not allow.
  • (Re)Chunk?
  • Load lazy data?
  • ... ?

(Note: we could add Index.chunk() and Index.load() methods in order to allow an Xarray index implement custom logic for the two latter cases like, e.g., convert a DaskIndex to a PandasIndex during load, see #8128).

cc @andersy005 (some changes made here may conflict with what you are refactoring in #8075).

{
    "url": "https://api.github.com/repos/pydata/xarray/issues/8124/reactions",
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
    13221727 pull

Links from other tables

  • 2 rows from issues_id in issues_labels
  • 0 rows from issue in issue_comments
Powered by Datasette · Queries took 0.843ms · About: xarray-datasette