home / github / issues

Menu
  • GraphQL API
  • Search all tables

issues: 1083621690

This data as json

id node_id number title user state locked assignee milestone comments created_at updated_at closed_at author_association active_lock_reason draft pull_request body reactions performed_via_github_app state_reason repo type
1083621690 I_kwDOAMm_X85AlsE6 6084 Initialise zarr metadata without computing dask graph 42455466 open 0     6 2021-12-17T21:17:42Z 2024-04-03T19:08:26Z   NONE      

Is your feature request related to a problem? Please describe. On writing large zarr stores, the xarray docs recommend first creating an initial Zarr store without writing all of its array data. The recommended approach is to first create a dummy dask-backed Dataset, and then call to_zarr with compute=False to write only metadata to Zarr. This works great.

It seems that in one common use case for this approach (including the example in the above docs), the entire dataset to be written to zarr is already represented in a Dataset (let's call this ds). Thus, rather than creating a dummy Dataset with exactly the same metadata as ds, it is more convenient to initialise the zarr Store with ds.to_zarr(..., compute=False). See for example:

https://discourse.pangeo.io/t/many-netcdf-to-single-zarr-store-using-concurrent-futures/2029 https://discourse.pangeo.io/t/map-blocks-and-to-zarr-region/2019 https://discourse.pangeo.io/t/netcdf-to-zarr-best-practices/1119/12 https://discourse.pangeo.io/t/best-practice-for-memory-management-to-iteratively-write-a-large-dataset-with-xarray/1989

However, calling to_zarr with compute=False still computes the dask graph for writing the Zarr store. The graph is never used in this use-case, but computing the graph can take a really long time for large graphs.

Describe the solution you'd like Is there scope to add an option to to_zarr to initialise the store without computing the dask graph? Or perhaps an initialise_zarr method would be cleaner?

{
    "url": "https://api.github.com/repos/pydata/xarray/issues/6084/reactions",
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
    13221727 issue

Links from other tables

  • 2 rows from issues_id in issues_labels
  • 4 rows from issue in issue_comments
Powered by Datasette · Queries took 0.488ms · About: xarray-datasette