home / github / issues

Menu
  • Search all tables
  • GraphQL API

issues: 1839199929

This data as json

id node_id number title user state locked assignee milestone comments created_at updated_at closed_at author_association active_lock_reason draft pull_request body reactions performed_via_github_app state_reason repo type
1839199929 PR_kwDOAMm_X85XUl4W 8051 Allow setting (or skipping) new indexes in open_dataset 4160723 open 0     9 2023-08-07T10:53:46Z 2024-02-03T19:12:48Z   MEMBER   0 pydata/xarray/pulls/8051
  • [x] Closes #6633
  • [ ] Tests added
  • [ ] User visible changes (including notable bug fixes) are documented in whats-new.rst
  • [ ] New functions/methods are listed in api.rst

This PR introduces a new boolean parameter set_indexes=True to xr.open_dataset(), which may be used to skip the creation of default (pandas) indexes when opening a dataset.

Currently works with the Zarr backend:

```python import numpy as np import xarray as xr

example dataset (real dataset may be much larger)

arr = np.random.random(size=1_000_000) xr.Dataset({"x": arr}).to_zarr("dataset.zarr")

xr.open_dataset("dataset.zarr", set_indexes=False, engine="zarr")

<xarray.Dataset>

Dimensions: (x: 1000000)

Coordinates:

x (x) float64 ...

Data variables:

empty

xr.open_zarr("dataset.zarr", set_indexes=False)

<xarray.Dataset>

Dimensions: (x: 1000000)

Coordinates:

x (x) float64 ...

Data variables:

empty

```

I'll add it to the other Xarray backends as well, but I'd like to get your thoughts about the API first.

  1. Do we want to add yet another keyword parameter to xr.open_dataset()? There are already many...
  2. Do we want to add this parameter to the BackendEntrypoint.open_dataset() API?
  3. I'm afraid we must do it if we want this parameter in xr.open_dataset()
  4. this would also make it possible skipping the creation of custom indexes (if any) in custom IO backends
  5. con: if we require set_indexes in the signature in addition to the drop_variables parameter, this is a breaking change for all existing 3rd-party backends. Or should we group set_indexes with the other xarray decoder kwargs? This would feel a bit odd to me as setting indexes is different from decoding data.
  6. Or should we leave this up to the backends?
  7. pros: no breaking change, more flexible (3rd party backends may want to offer more control like choosing between custom indexes and default pandas indexes or skipping the creation of indexes by default)
  8. cons: less discoverable, consistency is not enforced across 3rd party backends (although for such advanced case this is probably OK), not available by default in every backend.

Currently 1 and 2 are implemented in this PR, although as I write this comment I think that I would prefer 3. I guess this depends on whether we prefer open_*** vs. xr.open_dataset(engine="***") and unless I missed something there is still no real consensus about that? (e.g., #7496).

{
    "url": "https://api.github.com/repos/pydata/xarray/issues/8051/reactions",
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
    13221727 pull

Links from other tables

  • 3 rows from issues_id in issues_labels
  • 0 rows from issue in issue_comments
Powered by Datasette · Queries took 0.623ms · About: xarray-datasette