home / github / pull_requests

Menu
  • GraphQL API
  • Search all tables

pull_requests: 1465015830

This data as json

id node_id number state locked title user body created_at updated_at closed_at merged_at merge_commit_sha assignee milestone draft head base author_association auto_merge repo url merged_by
1465015830 PR_kwDOAMm_X85XUl4W 8051 open 0 Allow setting (or skipping) new indexes in open_dataset 4160723 <!-- Feel free to remove check-list items aren't relevant to your change --> - [x] Closes #6633 - [ ] Tests added - [ ] User visible changes (including notable bug fixes) are documented in `whats-new.rst` - [ ] New functions/methods are listed in `api.rst` This PR introduces a new boolean parameter `set_indexes=True` to `xr.open_dataset()`, which may be used to skip the creation of default (pandas) indexes when opening a dataset. Currently works with the Zarr backend: ```python import numpy as np import xarray as xr # example dataset (real dataset may be much larger) arr = np.random.random(size=1_000_000) xr.Dataset({"x": arr}).to_zarr("dataset.zarr") xr.open_dataset("dataset.zarr", set_indexes=False, engine="zarr") # <xarray.Dataset> # Dimensions: (x: 1000000) # Coordinates: # x (x) float64 ... # Data variables: # *empty* xr.open_zarr("dataset.zarr", set_indexes=False) # <xarray.Dataset> # Dimensions: (x: 1000000) # Coordinates: # x (x) float64 ... # Data variables: # *empty* ``` I'll add it to the other Xarray backends as well, but I'd like to get your thoughts about the API first. 1. Do we want to add yet another keyword parameter to `xr.open_dataset()`? There are already many... 2. Do we want to add this parameter to the `BackendEntrypoint.open_dataset()` API? - I'm afraid we must do it if we want this parameter in `xr.open_dataset()` - this would also make it possible skipping the creation of custom indexes (if any) in custom IO backends - con: if we require `set_indexes` in the signature in addition to the `drop_variables` parameter, this is a breaking change for all existing 3rd-party backends. Or should we group `set_indexes` with the other xarray decoder kwargs? This would feel a bit odd to me as setting indexes is different from decoding data. 3. Or should we leave this up to the backends? - pros: no breaking change, more flexible (3rd party backends may want to offer more control like choosing between custom indexes and default pandas indexes or skipping the creation of indexes by default) - cons: less discoverable, consistency is not enforced across 3rd party backends (although for such advanced case this is probably OK), not available by default in every backend. Currently 1 and 2 are implemented in this PR, although as I write this comment I think that I would prefer 3. I guess this depends on whether we prefer `open_***` vs. `xr.open_dataset(engine="***")` and unless I missed something there is still no real consensus about that? (e.g., #7496). 2023-08-07T10:53:46Z 2024-02-03T19:12:48Z     0b37c66130416f202c3b8ee2302ee9ea517bdadd     0 eae983bb6b7ee916e5c8956b6af42c2207ad48d1 c9ba2be2690564594a89eb93fb5d5c4ae7a9253c MEMBER   13221727 https://github.com/pydata/xarray/pull/8051  

Links from other tables

  • 3 rows from pull_requests_id in labels_pull_requests
Powered by Datasette · Queries took 0.75ms