issues: 1839199929

This data as json

id	node_id	number	title	user	state	locked	assignee	milestone	comments	created_at	updated_at	closed_at	author_association	active_lock_reason	draft	pull_request	body	reactions	performed_via_github_app	state_reason	repo	type
1839199929	PR_kwDOAMm_X85XUl4W	8051	Allow setting (or skipping) new indexes in open_dataset	4160723	open	0			9	2023-08-07T10:53:46Z	2024-02-03T19:12:48Z		MEMBER		0	pydata/xarray/pulls/8051	[x] Closes #6633 [ ] Tests added [ ] User visible changes (including notable bug fixes) are documented in `whats-new.rst` [ ] New functions/methods are listed in `api.rst` This PR introduces a new boolean parameter `set_indexes=True` to `xr.open_dataset()`, which may be used to skip the creation of default (pandas) indexes when opening a dataset. Currently works with the Zarr backend: ```python import numpy as np import xarray as xr example dataset (real dataset may be much larger) arr = np.random.random(size=1_000_000) xr.Dataset({"x": arr}).to_zarr("dataset.zarr") xr.open_dataset("dataset.zarr", set_indexes=False, engine="zarr") <xarray.Dataset> Dimensions: (x: 1000000) Coordinates: x (x) float64 ... Data variables: empty xr.open_zarr("dataset.zarr", set_indexes=False) <xarray.Dataset> Dimensions: (x: 1000000) Coordinates: x (x) float64 ... Data variables: empty ``` I'll add it to the other Xarray backends as well, but I'd like to get your thoughts about the API first. Do we want to add yet another keyword parameter to `xr.open_dataset()`? There are already many... Do we want to add this parameter to the `BackendEntrypoint.open_dataset()` API? I'm afraid we must do it if we want this parameter in `xr.open_dataset()` this would also make it possible skipping the creation of custom indexes (if any) in custom IO backends con: if we require `set_indexes` in the signature in addition to the `drop_variables` parameter, this is a breaking change for all existing 3rd-party backends. Or should we group `set_indexes` with the other xarray decoder kwargs? This would feel a bit odd to me as setting indexes is different from decoding data. Or should we leave this up to the backends? pros: no breaking change, more flexible (3rd party backends may want to offer more control like choosing between custom indexes and default pandas indexes or skipping the creation of indexes by default) cons: less discoverable, consistency is not enforced across 3rd party backends (although for such advanced case this is probably OK), not available by default in every backend. Currently 1 and 2 are implemented in this PR, although as I write this comment I think that I would prefer 3. I guess this depends on whether we prefer `open_*` vs. `xr.open_dataset(engine="*")` and unless I missed something there is still no real consensus about that? (e.g., #7496).	{ "url": "https://api.github.com/repos/pydata/xarray/issues/8051/reactions", "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }			13221727	pull

Links from other tables

3 rows from issues_id in issues_labels
0 rows from issue in issue_comments