issues: 2116695961

This data as json

id	node_id	number	title	user	state	locked	assignee	milestone	comments	created_at	updated_at	closed_at	author_association	active_lock_reason	draft	pull_request	body	reactions	performed_via_github_app	state_reason	repo	type
2116695961	I_kwDOAMm_X85-KjeZ	8699	Wrapping a `kerchunk.Array` object directly with xarray	35968931	open	0			3	2024-02-03T22:15:07Z	2024-02-04T21:15:14Z		MEMBER				What is your issue? In https://github.com/fsspec/kerchunk/issues/377 the idea came up of using the xarray API to concatenate arrays which represent parts of a zarr store - i.e. using xarray to kerchunk a large set of netCDF files instead of using `kerchunk.combine.MultiZarrToZarr`. The idea is to make something like this work for kerchunking sets of netCDF files into zarr stores ```python ds = xr.open_mfdataset( '/my/files*.nc' engine='kerchunk', # kerchunk registers an xarray IO backend that returns zarr.Array objects combine='nested', # 'by_coords' would require actually reading coordinate data parallel=True, # would use dask.delayed to generate reference dicts for each file in parallel ) ds # now wraps a bunch of zarr.Array / kerchunk.Array objects, no need for dask arrays ds.kerchunk.to_zarr(store='out.zarr') # kerchunk defines an xarray accessor that extracts the zarr arrays and serializes them (which could also be done in parallel if writing to parquet) ``` I had a go at doing this in this notebook, and in doing so discovered a few potential issues with xarray's internals. For this to work xarray has to: - Wrap a `kerchunk.Array` object which barely defines any array API methods, including basically not supporting indexing at all, - Store all the information present in a kerchunked Zarr store but without ever loading any data, - Not create any indexes by default during dataset construction or during `xr.concat`, - Not try to do anything else that can't be defined for a `kerchunk.Array`. - Possibly we need the Lazy Indexing classes to support concatenation https://github.com/pydata/xarray/issues/4628 It's an interesting exercise in using xarray as an abstraction, with no access to real numerical values at all.	{ "url": "https://api.github.com/repos/pydata/xarray/issues/8699/reactions", "total_count": 1, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 1 }			13221727	issue

Links from other tables

8 rows from issues_id in issues_labels
0 rows from issue in issue_comments