issues: 1965161886

This data as json

id	node_id	number	title	user	state	locked	assignee	milestone	comments	created_at	updated_at	closed_at	author_association	active_lock_reason	draft	pull_request	body	reactions	performed_via_github_app	state_reason	repo	type
1965161886	I_kwDOAMm_X851If2e	8382	Zarr Chunks: Too many chunks created if there is one small initial chunk.	105014161	closed	0			12	2023-10-27T09:49:07Z	2023-11-07T17:40:19Z	2023-11-06T11:53:24Z	NONE				What is your issue? If the first Zarr chunk is small (a few items), every subsequent chunk created will be tiny, and this will cause massive issues reading back the dataset. Consider the following code (MCVE): ```python import numpy as np import xarray as xr Create and write a dataset with ONE tiny chunk per variable ds = xr.Dataset() ds.coords["x"] = "x", np.zeros((1,), dtype=np.uint64) ds["data"] = "x", np.zeros((1,), dtype=np.bool_) ds.to_zarr("/tmp/temp.zarr") Append to that dataset a larger amount of data ds2 = xr.Dataset() ds2.coords["x"] = "x", np.arange(1, 1000, dtype=np.uint64) ds2["data"] = "x", np.zeros(999, dtype=np.bool_) ds2.to_zarr("/tmp/temp.zarr", append_dim="x") These chunks should be MUCH larger, but they're one item each for me. ds_read = xr.open_zarr("/tmp/temp.zarr") for var in ds_read.variables: print(f"{var=}, {ds_read[var].encoding['chunks']=}") ``` Is there a way to change this behaviour by default, ideally within xarray (or zarr-python if it is responsible)? Perhaps, if only one chunk is present, the heuristic should consider appending to it instead of creating new chunks?	{ "url": "https://api.github.com/repos/pydata/xarray/issues/8382/reactions", "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }		completed	13221727	issue

Links from other tables

1 row from issues_id in issues_labels
0 rows from issue in issue_comments