home / github / issues

Menu
  • Search all tables
  • GraphQL API

issues: 1965161886

This data as json

id node_id number title user state locked assignee milestone comments created_at updated_at closed_at author_association active_lock_reason draft pull_request body reactions performed_via_github_app state_reason repo type
1965161886 I_kwDOAMm_X851If2e 8382 Zarr Chunks: Too many chunks created if there is one small initial chunk. 105014161 closed 0     12 2023-10-27T09:49:07Z 2023-11-07T17:40:19Z 2023-11-06T11:53:24Z NONE      

What is your issue?

If the first Zarr chunk is small (a few items), every subsequent chunk created will be tiny, and this will cause massive issues reading back the dataset. Consider the following code (MCVE):

```python import numpy as np import xarray as xr

Create and write a dataset with ONE tiny chunk per variable

ds = xr.Dataset() ds.coords["x"] = "x", np.zeros((1,), dtype=np.uint64) ds["data"] = "x", np.zeros((1,), dtype=np.bool_) ds.to_zarr("/tmp/temp.zarr")

Append to that dataset a larger amount of data

ds2 = xr.Dataset() ds2.coords["x"] = "x", np.arange(1, 1000, dtype=np.uint64) ds2["data"] = "x", np.zeros(999, dtype=np.bool_) ds2.to_zarr("/tmp/temp.zarr", append_dim="x")

These chunks should be MUCH larger, but they're one item each for me.

ds_read = xr.open_zarr("/tmp/temp.zarr") for var in ds_read.variables: print(f"{var=}, {ds_read[var].encoding['chunks']=}") ```

Is there a way to change this behaviour by default, ideally within xarray (or zarr-python if it is responsible)? Perhaps, if only one chunk is present, the heuristic should consider appending to it instead of creating new chunks?

{
    "url": "https://api.github.com/repos/pydata/xarray/issues/8382/reactions",
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  completed 13221727 issue

Links from other tables

  • 1 row from issues_id in issues_labels
  • 0 rows from issue in issue_comments
Powered by Datasette · Queries took 77.23ms · About: xarray-datasette