home / github / issues

Menu
  • GraphQL API
  • Search all tables

issues: 836391524

This data as json

id node_id number title user state locked assignee milestone comments created_at updated_at closed_at author_association active_lock_reason draft pull_request body reactions performed_via_github_app state_reason repo type
836391524 MDU6SXNzdWU4MzYzOTE1MjQ= 5056 Allow "unsafe" mode for zarr writing 1197350 closed 0     1 2021-03-19T21:57:47Z 2021-04-26T16:37:43Z 2021-04-26T16:37:43Z MEMBER      

Curently, Dataset.to_zarr will only write Zarr datasets in cases in which - The Dataset arrays are in memory (no dask) - The arrays are chunked with dask with a one-to-many relationship between dask chunks and zarr chunks

If I try to violate the one-to-many condition, I get an error

python import xarray as xr ds = xr.DataArray([0, 1., 2], name='foo').chunk({'dim_0': 1}).to_dataset() d = ds.to_zarr('test.zarr', encoding={'foo': {'chunks': (3,)}}, compute=False)

``` /srv/conda/envs/notebook/lib/python3.8/site-packages/xarray/backends/zarr.py in _determine_zarr_chunks(enc_chunks, var_chunks, ndim, name) 148 for dchunk in dchunks[:-1]: 149 if dchunk % zchunk: --> 150 raise NotImplementedError( 151 f"Specified zarr chunks encoding['chunks']={enc_chunks_tuple!r} for " 152 f"variable named {name!r} would overlap multiple dask chunks {var_chunks!r}. "

NotImplementedError: Specified zarr chunks encoding['chunks']=(3,) for variable named 'foo' would overlap multiple dask chunks ((1, 1, 1),). This is not implemented in xarray yet. Consider either rechunking using chunk() or instead deleting or modifying encoding['chunks']. ```

In this case, the error is particularly frustrating because I'm not even writing any data yet. (Also related to #2300, #4046, #4380).

There are at least two scenarios in which we might want to have more flexibility. 1. The case above, when we want to lazily initialize a Zarr array based on a Dataset, without actually computing anything. 2. The more general case, where we actually write arrays with many-to-many dask-chunk <-> zarr-chunk relationships

For 1, I propose we add a new option like safe_chunks=True to to_zarr. safe_chunks=False would permit just bypassing this chunk.

For 2, we could consider implementing locks. This probably has to be done at the Dask level. But is actually not super hard to deterministically figure out which chunks need to share a lock.

{
    "url": "https://api.github.com/repos/pydata/xarray/issues/5056/reactions",
    "total_count": 1,
    "+1": 1,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  completed 13221727 issue

Links from other tables

  • 1 row from issues_id in issues_labels
  • 1 row from issue in issue_comments
Powered by Datasette · Queries took 0.564ms · About: xarray-datasette