home / github / issues

Menu
  • GraphQL API
  • Search all tables

issues: 2116695961

This data as json

id node_id number title user state locked assignee milestone comments created_at updated_at closed_at author_association active_lock_reason draft pull_request body reactions performed_via_github_app state_reason repo type
2116695961 I_kwDOAMm_X85-KjeZ 8699 Wrapping a `kerchunk.Array` object directly with xarray 35968931 open 0     3 2024-02-03T22:15:07Z 2024-02-04T21:15:14Z   MEMBER      

What is your issue?

In https://github.com/fsspec/kerchunk/issues/377 the idea came up of using the xarray API to concatenate arrays which represent parts of a zarr store - i.e. using xarray to kerchunk a large set of netCDF files instead of using kerchunk.combine.MultiZarrToZarr.

The idea is to make something like this work for kerchunking sets of netCDF files into zarr stores

```python ds = xr.open_mfdataset( '/my/files*.nc' engine='kerchunk', # kerchunk registers an xarray IO backend that returns zarr.Array objects combine='nested', # 'by_coords' would require actually reading coordinate data parallel=True, # would use dask.delayed to generate reference dicts for each file in parallel )

ds # now wraps a bunch of zarr.Array / kerchunk.Array objects, no need for dask arrays

ds.kerchunk.to_zarr(store='out.zarr') # kerchunk defines an xarray accessor that extracts the zarr arrays and serializes them (which could also be done in parallel if writing to parquet) ```

I had a go at doing this in this notebook, and in doing so discovered a few potential issues with xarray's internals.

For this to work xarray has to: - Wrap a kerchunk.Array object which barely defines any array API methods, including basically not supporting indexing at all, - Store all the information present in a kerchunked Zarr store but without ever loading any data, - Not create any indexes by default during dataset construction or during xr.concat, - Not try to do anything else that can't be defined for a kerchunk.Array. - Possibly we need the Lazy Indexing classes to support concatenation https://github.com/pydata/xarray/issues/4628

It's an interesting exercise in using xarray as an abstraction, with no access to real numerical values at all.

{
    "url": "https://api.github.com/repos/pydata/xarray/issues/8699/reactions",
    "total_count": 1,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 1
}
    13221727 issue

Links from other tables

  • 8 rows from issues_id in issues_labels
  • 0 rows from issue in issue_comments
Powered by Datasette · Queries took 0.512ms · About: xarray-datasette