issues: 2116695961
This data as json
| id | node_id | number | title | user | state | locked | assignee | milestone | comments | created_at | updated_at | closed_at | author_association | active_lock_reason | draft | pull_request | body | reactions | performed_via_github_app | state_reason | repo | type |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 2116695961 | I_kwDOAMm_X85-KjeZ | 8699 | Wrapping a `kerchunk.Array` object directly with xarray | 35968931 | open | 0 | 3 | 2024-02-03T22:15:07Z | 2024-02-04T21:15:14Z | MEMBER | What is your issue?In https://github.com/fsspec/kerchunk/issues/377 the idea came up of using the xarray API to concatenate arrays which represent parts of a zarr store - i.e. using xarray to kerchunk a large set of netCDF files instead of using The idea is to make something like this work for kerchunking sets of netCDF files into zarr stores ```python ds = xr.open_mfdataset( '/my/files*.nc' engine='kerchunk', # kerchunk registers an xarray IO backend that returns zarr.Array objects combine='nested', # 'by_coords' would require actually reading coordinate data parallel=True, # would use dask.delayed to generate reference dicts for each file in parallel ) ds # now wraps a bunch of zarr.Array / kerchunk.Array objects, no need for dask arrays ds.kerchunk.to_zarr(store='out.zarr') # kerchunk defines an xarray accessor that extracts the zarr arrays and serializes them (which could also be done in parallel if writing to parquet) ``` I had a go at doing this in this notebook, and in doing so discovered a few potential issues with xarray's internals. For this to work xarray has to:
- Wrap a It's an interesting exercise in using xarray as an abstraction, with no access to real numerical values at all. |
{
"url": "https://api.github.com/repos/pydata/xarray/issues/8699/reactions",
"total_count": 1,
"+1": 0,
"-1": 0,
"laugh": 0,
"hooray": 0,
"confused": 0,
"heart": 0,
"rocket": 0,
"eyes": 1
} |
13221727 | issue |