home / github / issues

Menu
  • Search all tables
  • GraphQL API

issues: 331415995

This data as json

id node_id number title user state locked assignee milestone comments created_at updated_at closed_at author_association active_lock_reason draft pull_request body reactions performed_via_github_app state_reason repo type
331415995 MDU6SXNzdWUzMzE0MTU5OTU= 2225 Zarr Backend: check for non-uniform chunks is too strict 2443309 closed 0     3 2018-06-12T02:36:05Z 2018-06-13T05:51:36Z 2018-06-13T05:51:36Z MEMBER      

I think the following block of code is more strict than either dask or zarr requires:

https://github.com/pydata/xarray/blob/6c3abedf906482111b06207b9016ea8493c42713/xarray/backends/zarr.py#L80-L89

It should be possible to have uneven chunks in the last position of multiple dimensions in a zarr dataset.

Code Sample, a copy-pastable example if possible

```python In [1]: import xarray as xr

In [2]: import dask.array as dsa

In [3]: da = xr.DataArray(dsa.random.random((8, 7, 11), chunks=(3, 3, 3)), dims=('x', 'y', 't'))

In [4]: da Out[4]: <xarray.DataArray 'da.random.random_sample-1aed3ea2f9dd784ec947cb119459fa56' (x: 8, y: 7, t: 11)> dask.array<shape=(8, 7, 11), dtype=float64, chunksize=(3, 3, 3)> Dimensions without coordinates: x, y, t

In [5]: da.data.chunks Out[5]: ((3, 3, 2), (3, 3, 1), (3, 3, 3, 2))

In [6]: da.to_dataset('varname').to_zarr('/Users/jhamman/workdir/test_chunks.zarr') /Users/jhamman/anaconda/bin/ipython:1: FutureWarning: the order of the arguments on DataArray.to_dataset has changed; you now need to supply name as a keyword argument #!/Users/jhamman/anaconda/bin/python


ValueError Traceback (most recent call last) <ipython-input-7-32fa9a7d0276> in <module>() ----> 1 da.to_dataset('varname').to_zarr('/Users/jhamman/workdir/test_chunks.zarr')

~/anaconda/lib/python3.6/site-packages/xarray/core/dataset.py in to_zarr(self, store, mode, synchronizer, group, encoding, compute) 1185 from ..backends.api import to_zarr 1186 return to_zarr(self, store=store, mode=mode, synchronizer=synchronizer, -> 1187 group=group, encoding=encoding, compute=compute) 1188 1189 def unicode(self):

~/anaconda/lib/python3.6/site-packages/xarray/backends/api.py in to_zarr(dataset, store, mode, synchronizer, group, encoding, compute) 856 # I think zarr stores should always be sync'd immediately 857 # TODO: figure out how to properly handle unlimited_dims --> 858 dataset.dump_to_store(store, sync=True, encoding=encoding, compute=compute) 859 860 if not compute:

~/anaconda/lib/python3.6/site-packages/xarray/core/dataset.py in dump_to_store(self, store, encoder, sync, encoding, unlimited_dims, compute) 1073 1074 store.store(variables, attrs, check_encoding, -> 1075 unlimited_dims=unlimited_dims) 1076 if sync: 1077 store.sync(compute=compute)

~/anaconda/lib/python3.6/site-packages/xarray/backends/zarr.py in store(self, variables, attributes, args, kwargs) 341 def store(self, variables, attributes, args, kwargs): 342 AbstractWritableDataStore.store(self, variables, attributes, --> 343 *args, kwargs) 344 345 def sync(self, compute=True):

~/anaconda/lib/python3.6/site-packages/xarray/backends/common.py in store(self, variables, attributes, check_encoding_set, unlimited_dims) 366 self.set_dimensions(variables, unlimited_dims=unlimited_dims) 367 self.set_variables(variables, check_encoding_set, --> 368 unlimited_dims=unlimited_dims) 369 370 def set_attributes(self, attributes):

~/anaconda/lib/python3.6/site-packages/xarray/backends/common.py in set_variables(self, variables, check_encoding_set, unlimited_dims) 403 check = vn in check_encoding_set 404 target, source = self.prepare_variable( --> 405 name, v, check, unlimited_dims=unlimited_dims) 406 407 self.writer.add(source, target)

~/anaconda/lib/python3.6/site-packages/xarray/backends/zarr.py in prepare_variable(self, name, variable, check_encoding, unlimited_dims) 325 326 encoding = _extract_zarr_variable_encoding( --> 327 variable, raise_on_invalid=check_encoding) 328 329 encoded_attrs = OrderedDict()

~/anaconda/lib/python3.6/site-packages/xarray/backends/zarr.py in _extract_zarr_variable_encoding(variable, raise_on_invalid) 181 182 chunks = _determine_zarr_chunks(encoding.get('chunks'), variable.chunks, --> 183 variable.ndim) 184 encoding['chunks'] = chunks 185 return encoding

~/anaconda/lib/python3.6/site-packages/xarray/backends/zarr.py in _determine_zarr_chunks(enc_chunks, var_chunks, ndim) 87 "Zarr requires uniform chunk sizes excpet for final chunk." 88 " Variable %r has incompatible chunks. Consider " ---> 89 "rechunking using chunk()." % (var_chunks,)) 90 # last chunk is allowed to be smaller 91 last_var_chunk = all_var_chunks[-1]

ValueError: Zarr requires uniform chunk sizes excpet for final chunk. Variable ((3, 3, 2), (3, 3, 1), (3, 3, 3, 2)) has incompatible chunks. Consider rechunking using chunk(). ```

Problem description

[this should explain why the current behavior is a problem and why the expected output is a better solution.]

Expected Output

IIUC, Zarr allows multiple dims to have uneven chunks, so long as they are all in the last position:

```Python In [9]: import zarr

In [10]: z = zarr.zeros((8, 7, 11), chunks=(3, 3, 3), dtype='i4')

In [11]: z.chunks Out[11]: (3, 3, 3) ```

Output of xr.show_versions()

INSTALLED VERSIONS ------------------ commit: None python: 3.6.5.final.0 python-bits: 64 OS: Darwin OS-release: 17.5.0 machine: x86_64 processor: i386 byteorder: little LC_ALL: None LANG: en_US.UTF-8 LOCALE: en_US.UTF-8 xarray: 0.10.7 pandas: 0.22.0 numpy: 1.14.3 scipy: 1.1.0 netCDF4: 1.3.1 h5netcdf: 0.5.1 h5py: 2.7.1 Nio: None zarr: 2.2.0 bottleneck: 1.2.1 cyordereddict: None dask: 0.17.2 distributed: 1.21.6 matplotlib: 2.2.2 cartopy: 0.16.0 seaborn: 0.8.1 setuptools: 39.0.1 pip: 9.0.3 conda: 4.5.4 pytest: 3.5.1 IPython: 6.3.1 sphinx: 1.7.4
{
    "url": "https://api.github.com/repos/pydata/xarray/issues/2225/reactions",
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  completed 13221727 issue

Links from other tables

  • 3 rows from issues_id in issues_labels
  • 3 rows from issue in issue_comments
Powered by Datasette · Queries took 18.336ms · About: xarray-datasette