home / github / issue_comments

Menu
  • Search all tables
  • GraphQL API

issue_comments: 673565228

This data as json

html_url issue_url id node_id user created_at updated_at author_association body reactions performed_via_github_app issue
https://github.com/pydata/xarray/issues/2300#issuecomment-673565228 https://api.github.com/repos/pydata/xarray/issues/2300 673565228 MDEyOklzc3VlQ29tbWVudDY3MzU2NTIyOA== 4441338 2020-08-13T16:04:04Z 2020-08-13T16:04:04Z NONE

I arrived here due to a different use case / problem, which ultimately I solved, but I think there's value in documenting it here. My use case is the following workflow: 1 . take raw data, build a dataset, append it to a zarr store Z 2 . analyze the data on Z, then maybe goto 1. Step 2's performance is much better when data on Z is chunked properly along the appending dimension 'frame' (chunks of size 50), however step 1 only adds 1 element along it. I end up with Z having chunks (1,1,1,1,1...) on 'frame'. On xarray 0.16.0, this seems solvable via the encoding parameter, if we take care to only use it on the store creation. Before that version, I was using something like the monkey patch posted by @chrisbarber . Code: ```python import shutil import xarray as xr import numpy as np import tempfile zarr_path = tempfile.mkdtemp()

def append_test(ds,chunks): shutil.rmtree(zarr_path)

for i in range(21):
    d = ds.isel(frame=slice(i,i+1))
    d = d.chunk(chunks)
    d.to_zarr(zarr_path,consolidated=True,**(dict(mode='a',append_dim='frame') if i>0 else {}))
dsa = xr.open_zarr(str(zarr_path),consolidated=True)
print(dsa.chunks,dsa.dims)

sometime before 0.16.0

import contextlib @contextlib.contextmanager def change_determine_zarr_chunks(chunks): orig_determine_zarr_chunks = xr.backends.zarr._determine_zarr_chunks try: def new_determine_zarr_chunks( enc_chunks, var_chunks, ndim, name): da = ds[name] zchunks = tuple(chunks[dim] if (dim in chunks and chunks[dim] is not None) else da.shape[i] for i,dim in enumerate(da.dims)) return zchunks xr.backends.zarr._determine_zarr_chunks = new_determine_zarr_chunks yield finally: xr.backends.zarr._determine_zarr_chunks = orig_determine_zarr_chunks chunks = {'frame':10,'other':50} ds = xr.Dataset({'data':xr.DataArray(data=np.random.rand(100,100),dims=('frame','other'))})

append_test(ds,chunks) with change_determine_zarr_chunks(chunks): append_test(ds,chunks)

with 0.16.0

def append_test_encoding(ds,chunks): shutil.rmtree(zarr_path)

encoding = {}
for k,v in ds.variables.items():
    encoding[k]={'chunks':tuple(chunks[dk] if dk in chunks else v.shape[i] for i,dk in enumerate(v.dims))}

for i in range(21):
    d = ds.isel(frame=slice(i,i+1))
    d = d.chunk(chunks)
    d.to_zarr(zarr_path,consolidated=True,**(dict(mode='a',append_dim='frame') if i>0 else dict(encoding = encoding)))
dsa = xr.open_zarr(str(zarr_path),consolidated=True)
print(dsa.chunks,dsa.dims)

append_test_encoding(ds,chunks) ```

Frozen(SortedKeysDict({'frame': (1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1), 'other': (50, 50)})) Frozen(SortedKeysDict({'frame': 21, 'other': 100})) Frozen(SortedKeysDict({'frame': (10, 10, 1), 'other': (50, 50)})) Frozen(SortedKeysDict({'frame': 21, 'other': 100})) Frozen(SortedKeysDict({'frame': (10, 10, 1), 'other': (50, 50)})) Frozen(SortedKeysDict({'frame': 21, 'other': 100}))

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  342531772
Powered by Datasette · Queries took 235.44ms · About: xarray-datasette