issues: 902009258

This data as json

id	node_id	number	title	user	state	locked	assignee	milestone	comments	created_at	updated_at	closed_at	author_association	active_lock_reason	draft	pull_request	body	reactions	performed_via_github_app	state_reason	repo	type
902009258	MDU6SXNzdWU5MDIwMDkyNTg=	5376	Multi-scale datasets and custom indexes	4160723	open	0			6	2021-05-26T08:38:00Z	2021-06-02T08:07:38Z		MEMBER				I've been wondering if: multi-scale datasets are generic enough to implement some related functionality in Xarray, e.g., as new `Dataset` and/or `DataArray` method(s) we could leverage custom indexes for that (see the design notes) I'm thinking of an API that would look like this: ```python lazily load a big n-d image (full resolution) as a xarray.Dataset xyz_dataset = ... set a new index for the x/y/z coordinates (`reduction` and `pre_compute_scales` are optional and passed as arguments to `ImagePyramidIndex`) xyz_dataset.set_index( ('x', 'y', 'z'), ImagePyramidIndex, reduction=np.mean, pre_compute_scales=(2, 2), ) get a slice (ImagePyramidIndex will be used to dynamically scale the data or load the right pre-computed dataset) xyz_slice = xyz_dataset.sel_and_rescale(x=slice(...), y=slice(...), z=slice(...)) ``` where `ImagePyramidIndex` is not a "common" index, i.e., it cannot be used directly with Xarray's `.sel()` nor for data alignment. Using an index here might still make sense for such data extraction and resampling operation IMHO. We could extend the `xarray.Index` API to handle multi-scale datasets, so that `ImagePyramidIndex` could either do the scaling dynamically (maybe using a cache) or just lazily load pre-computed data, e.g., from a NGFF / OME-Zarr dataset... Both the implementation and functionality can be pretty flexible. Custom options may be passed through the Xarray API either when creating the index or when extracting a data slice. A hierarchical structure of `xarray.Dataset` objects is already discussed in #4118 for multi-scale datasets, but I'm wondering if using indexes could be an alternative approach (it could also be complementary, i.e., `ImagePyramidIndex` could rely on such hierarchical structure under the hood). I'd see some advantages of the index approach, although this is the perspective from a naive user who is not working with multi-scale datasets: it is flexible: the scaling may be done dynamically without having to store the results in a hierarchical collection with some predefined discrete levels we don't need to expose anything other than a simple `xarray.Dataset` + a "black-box" index in which we abstract away all the implementation details. The API example shown above seems more intuitive to me than having to deal directly with Dataset groups. Xarray will provide a plugin system for 3rd party indexes, allowing for more `ImagePyramidIndex` variants. Xarray already provides an extension mechanism (accessors) for methods like `sel_and_rescale` in the example above... That said, I'd also see the benefits of exposing Dataset groups more transparently to users (in case those are loaded from a store that supports it). cc @thewtex @joshmoore @d-v-b	{ "url": "https://api.github.com/repos/pydata/xarray/issues/5376/reactions", "total_count": 1, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 1 }			13221727	issue

Links from other tables

0 rows from issues_id in issues_labels
6 rows from issue in issue_comments