html_url,issue_url,id,node_id,user,created_at,updated_at,author_association,body,reactions,performed_via_github_app,issue https://github.com/pydata/xarray/issues/2300#issuecomment-673565228,https://api.github.com/repos/pydata/xarray/issues/2300,673565228,MDEyOklzc3VlQ29tbWVudDY3MzU2NTIyOA==,4441338,2020-08-13T16:04:04Z,2020-08-13T16:04:04Z,NONE,"I arrived here due to a different use case / problem, which ultimately I solved, but I think there's value in documenting it here. My use case is the following workflow: 1 . take raw data, build a dataset, append it to a zarr store Z 2 . analyze the data on Z, then maybe goto 1. Step 2's performance is much better when data on Z is chunked properly along the appending dimension 'frame' (chunks of size 50), however step 1 only adds 1 element along it. I end up with Z having chunks (1,1,1,1,1...) on 'frame'. On xarray 0.16.0, this seems solvable via the encoding parameter, if we take care to only use it on the store creation. Before that version, I was using something like the monkey patch posted by @chrisbarber . Code: ```python import shutil import xarray as xr import numpy as np import tempfile zarr_path = tempfile.mkdtemp() def append_test(ds,chunks): shutil.rmtree(zarr_path) for i in range(21): d = ds.isel(frame=slice(i,i+1)) d = d.chunk(chunks) d.to_zarr(zarr_path,consolidated=True,**(dict(mode='a',append_dim='frame') if i>0 else {})) dsa = xr.open_zarr(str(zarr_path),consolidated=True) print(dsa.chunks,dsa.dims) #sometime before 0.16.0 import contextlib @contextlib.contextmanager def change_determine_zarr_chunks(chunks): orig_determine_zarr_chunks = xr.backends.zarr._determine_zarr_chunks try: def new_determine_zarr_chunks( enc_chunks, var_chunks, ndim, name): da = ds[name] zchunks = tuple(chunks[dim] if (dim in chunks and chunks[dim] is not None) else da.shape[i] for i,dim in enumerate(da.dims)) return zchunks xr.backends.zarr._determine_zarr_chunks = new_determine_zarr_chunks yield finally: xr.backends.zarr._determine_zarr_chunks = orig_determine_zarr_chunks chunks = {'frame':10,'other':50} ds = xr.Dataset({'data':xr.DataArray(data=np.random.rand(100,100),dims=('frame','other'))}) append_test(ds,chunks) with change_determine_zarr_chunks(chunks): append_test(ds,chunks) #with 0.16.0 def append_test_encoding(ds,chunks): shutil.rmtree(zarr_path) encoding = {} for k,v in ds.variables.items(): encoding[k]={'chunks':tuple(chunks[dk] if dk in chunks else v.shape[i] for i,dk in enumerate(v.dims))} for i in range(21): d = ds.isel(frame=slice(i,i+1)) d = d.chunk(chunks) d.to_zarr(zarr_path,consolidated=True,**(dict(mode='a',append_dim='frame') if i>0 else dict(encoding = encoding))) dsa = xr.open_zarr(str(zarr_path),consolidated=True) print(dsa.chunks,dsa.dims) append_test_encoding(ds,chunks) ``` ``` Frozen(SortedKeysDict({'frame': (1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1), 'other': (50, 50)})) Frozen(SortedKeysDict({'frame': 21, 'other': 100})) Frozen(SortedKeysDict({'frame': (10, 10, 1), 'other': (50, 50)})) Frozen(SortedKeysDict({'frame': 21, 'other': 100})) Frozen(SortedKeysDict({'frame': (10, 10, 1), 'other': (50, 50)})) Frozen(SortedKeysDict({'frame': 21, 'other': 100})) ``` ","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,342531772