home / github / issues

Menu
  • Search all tables
  • GraphQL API

issues: 1128282637

This data as json

id node_id number title user state locked assignee milestone comments created_at updated_at closed_at author_association active_lock_reason draft pull_request body reactions performed_via_github_app state_reason repo type
1128282637 I_kwDOAMm_X85DQDoN 6255 Writing large (aligned) dask-chunks to small zarr chunks fails. 6574622 closed 0     0 2022-02-09T09:35:24Z 2022-02-09T15:12:31Z 2022-02-09T15:12:31Z CONTRIBUTOR      

What happened?

I'm trying to write a dataset which is (dask-) chunked in large chunks into zarr which should be chunked in smaller chunks. The dask chunks are intentionally chosen to be integer multiples of the zarr chunks, such that there will never be two dask chunks which may be written into a single zarr chunk.

When trying to write such a dataset using to_zarr, the following exception appears:

NotImplementedError: Final chunk of Zarr array must be the same size or smaller than the first. Specified Zarr chunk encoding['chunks']=(1,), for variable named 'a' but (2, 2) in the variable's Dask chunks ((2, 2),) are incompatible with this encoding. Consider either rechunking using `chunk()`, deleting or modifying `encoding['chunks']`, or specify `safe_chunks=False`.

What did you expect to happen?

I'd expect the write to "just work".

Minimal Complete Verifiable Example

python import xarray as xr ds = xr.Dataset({"a": ("x", [1, 2, 3, 4])}).chunk({"x": 2}) m = {} ds.to_zarr(m, encoding={"a": {"chunks": (1,)}})

Relevant log output

No response

Anything else we need to know?

I believe that the expected behaviour is according to this design choice: # DESIGN CHOICE: do not allow multiple dask chunks on a single zarr chunk # this avoids the need to get involved in zarr synchronization / locking # From zarr docs: # "If each worker in a parallel computation is writing to a separate # region of the array, and if region boundaries are perfectly aligned # with chunk boundaries, then no synchronization is required."

But I believe that this if-statement is not needed and should be removed. The if-statement compares the size of the last dask-chunk within each dimenstion to the zarr-chunk size. There are three possible cases, which (as far as I understand) should all be just fine: * the dask-chunk is smaller than the zarr chunk: one dask chunk will write into one (smaller, last) zarr chunk * the dask-chunk is equal than the zarr chunk: one dask chunk will write into one zarr chunk * ther dask-chunk is larger than the zarr chunk: one dask chunk will write into multiple zarr chunks. None of these zarr chunks will be touched by any other dask-chunk as all previous dask chunks are aligned to zarr-chunk boundaries.

Note: If that if-statement goes away, this one may go away as well (was introduced in #4312).

Environment

INSTALLED VERSIONS ``` commit: None python: 3.9.10 (main, Jan 15 2022, 11:48:00) [Clang 13.0.0 (clang-1300.0.29.3)] python-bits: 64 OS: Darwin OS-release: 20.5.0 machine: x86_64 processor: i386 byteorder: little LC_ALL: None LANG: de_DE.UTF-8 LOCALE: ('de_DE', 'UTF-8') libhdf5: 1.12.0 libnetcdf: 4.7.4 xarray: 0.20.1 pandas: 1.2.0 numpy: 1.21.2 scipy: 1.6.2 netCDF4: 1.5.8 pydap: installed h5netcdf: 0.11.0 h5py: 3.2.1 Nio: None zarr: 2.10.2 cftime: 1.3.1 nc_time_axis: None PseudoNetCDF: None rasterio: None cfgrib: None iris: None bottleneck: None dask: 2021.11.1 distributed: 2021.11.1 matplotlib: 3.4.1 cartopy: 0.20.1 seaborn: 0.11.1 numbagg: None fsspec: 2021.11.1 cupy: None pint: 0.17 sparse: 0.13.0 setuptools: 60.5.0 pip: 21.3.1 conda: None pytest: 6.2.2 IPython: 8.0.0.dev sphinx: 3.5.0 ```
{
    "url": "https://api.github.com/repos/pydata/xarray/issues/6255/reactions",
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  completed 13221727 issue

Links from other tables

  • 2 rows from issues_id in issues_labels
  • 0 rows from issue in issue_comments
Powered by Datasette · Queries took 0.787ms · About: xarray-datasette