issues: 1845132891
This data as json
id | node_id | number | title | user | state | locked | assignee | milestone | comments | created_at | updated_at | closed_at | author_association | active_lock_reason | draft | pull_request | body | reactions | performed_via_github_app | state_reason | repo | type |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
1845132891 | I_kwDOAMm_X85t-n5b | 8062 | Dataset.chunk() does not overwrite encoding["chunks"] | 2466330 | open | 0 | 4 | 2023-08-10T12:54:12Z | 2023-08-14T18:23:36Z | CONTRIBUTOR | What happened?When using the Looking at the implementation of I do not know why this default value was chosen as False, or what could break if it was changed to True, but looking at the documentation, it seems the opposite of the intended effect. From the documentation of
Which is exactly what it doesn't. What did you expect to happen?I would expect the "chunks" entry of the Minimal Complete Verifiable Example```Python import xarray as xr import numpy as np Create a test Dataset with dimension x and y, each of size 100, and a chunksize of 50ds_original = xr.Dataset({"my_var": (["x", "y"], np.random.randn(100, 100))}) Since 'chunk' does not work, manually set encodingds_original .my_var.encoding["chunks"] = (50, 50) To best showcase the real-life example, write it to file and read it back again.The same could be achieved by just calling .chunk() with chunksizes of 25, but this feels more 'complete'filepath = "~/chunk_test.zarr" ds_original.to_zarr(filepath) ds = xr.open_zarr(filepath) Check the chunksizes and "chunks" encodingprint(ds.my_var.chunks) >>> ((50, 50), (50, 50))print(ds.my_var.encoding["chunks"]) >>> (50, 50)Rechunk the Datasetds = ds.chunk({"x": 25, "y": 25}) The chunksizes have changedprint(ds.my_var.chunks) >>> ((25, 25, 25, 25), (25, 25, 25, 25))But the encoding value remains the sameprint(ds.my_var.encoding["chunks"]) >>> (50, 50)Attempting to write this back to zarr raises an errords.to_zarr("~/chunk_test_rechunked.zarr") NotImplementedError: Specified zarr chunks encoding['chunks']=(50, 50) for variable named 'my_var' would overlap multiple dask chunks ((25, 25, 25, 25), (25, 25, 25, 25)). Writing this array in parallel with dask could lead to corrupted data. Consider either rechunking using
|
{ "url": "https://api.github.com/repos/pydata/xarray/issues/8062/reactions", "total_count": 2, "+1": 1, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 1 } |
13221727 | issue |