pull_requests: 1465015830

This data as json

id	node_id	number	state	locked	title	user	body	created_at	updated_at	closed_at	merged_at	merge_commit_sha	assignee	milestone	draft	head	base	author_association	auto_merge	repo	url	merged_by
1465015830	PR_kwDOAMm_X85XUl4W	8051	open	0	Allow setting (or skipping) new indexes in open_dataset	4160723	<!-- Feel free to remove check-list items aren't relevant to your change --> - [x] Closes #6633 - [ ] Tests added - [ ] User visible changes (including notable bug fixes) are documented in `whats-new.rst` - [ ] New functions/methods are listed in `api.rst` This PR introduces a new boolean parameter `set_indexes=True` to `xr.open_dataset()`, which may be used to skip the creation of default (pandas) indexes when opening a dataset. Currently works with the Zarr backend: ```python import numpy as np import xarray as xr # example dataset (real dataset may be much larger) arr = np.random.random(size=1_000_000) xr.Dataset({"x": arr}).to_zarr("dataset.zarr") xr.open_dataset("dataset.zarr", set_indexes=False, engine="zarr") # <xarray.Dataset> # Dimensions: (x: 1000000) # Coordinates: # x (x) float64 ... # Data variables: # empty xr.open_zarr("dataset.zarr", set_indexes=False) # <xarray.Dataset> # Dimensions: (x: 1000000) # Coordinates: # x (x) float64 ... # Data variables: # empty ``` I'll add it to the other Xarray backends as well, but I'd like to get your thoughts about the API first. 1. Do we want to add yet another keyword parameter to `xr.open_dataset()`? There are already many... 2. Do we want to add this parameter to the `BackendEntrypoint.open_dataset()` API? - I'm afraid we must do it if we want this parameter in `xr.open_dataset()` - this would also make it possible skipping the creation of custom indexes (if any) in custom IO backends - con: if we require `set_indexes` in the signature in addition to the `drop_variables` parameter, this is a breaking change for all existing 3rd-party backends. Or should we group `set_indexes` with the other xarray decoder kwargs? This would feel a bit odd to me as setting indexes is different from decoding data. 3. Or should we leave this up to the backends? - pros: no breaking change, more flexible (3rd party backends may want to offer more control like choosing between custom indexes and default pandas indexes or skipping the creation of indexes by default) - cons: less discoverable, consistency is not enforced across 3rd party backends (although for such advanced case this is probably OK), not available by default in every backend. Currently 1 and 2 are implemented in this PR, although as I write this comment I think that I would prefer 3. I guess this depends on whether we prefer `open_*` vs. `xr.open_dataset(engine="*")` and unless I missed something there is still no real consensus about that? (e.g., #7496).	2023-08-07T10:53:46Z	2024-02-03T19:12:48Z			0b37c66130416f202c3b8ee2302ee9ea517bdadd			0	eae983bb6b7ee916e5c8956b6af42c2207ad48d1	c9ba2be2690564594a89eb93fb5d5c4ae7a9253c	MEMBER		13221727	https://github.com/pydata/xarray/pull/8051

Links from other tables

3 rows from pull_requests_id in labels_pull_requests