issues: 1845132891

This data as json

id	node_id	number	title	user	state	locked	assignee	milestone	comments	created_at	updated_at	closed_at	author_association	active_lock_reason	draft	pull_request	body	reactions	performed_via_github_app	state_reason	repo	type
1845132891	I_kwDOAMm_X85t-n5b	8062	Dataset.chunk() does not overwrite encoding["chunks"]	2466330	open	0			4	2023-08-10T12:54:12Z	2023-08-14T18:23:36Z		CONTRIBUTOR				What happened? When using the `chunk` function to change the chunk sizes of a Dataset (or DataArray, which uses the Dataset implementation of `chunk`), the chunk sizes of the Dask arrays are changed, but the "chunks" entry of the `encoding` attributes are not changed accordingly. This causes the raising of a NotImplementedError when attempting to write the Dataset to a zarr (and presumably other formats as well). Looking at the implementation of `chunk`, every variable is rechunked using the `_maybe_chunk` function, which actually has the parameter `overwrite_encoded_chunks` to control just this behavior. However, it is an optional parameter which defaults to False, and the call in `chunk` does not provide a value for this parameter, nor does it offer the caller to influence it (by having an `overwrite_encoded_chunks` parameter itself, for example). I do not know why this default value was chosen as False, or what could break if it was changed to True, but looking at the documentation, it seems the opposite of the intended effect. From the documentation of `to_zarr`: Zarr chunks are determined in the following way: From the chunks attribute in each variable’s encoding (can be set via Dataset.chunk). Which is exactly what it doesn't. What did you expect to happen? I would expect the "chunks" entry of the `encoding` attribute to be changed to reflect the new chunking scheme. Minimal Complete Verifiable Example ```Python import xarray as xr import numpy as np Create a test Dataset with dimension x and y, each of size 100, and a chunksize of 50 ds_original = xr.Dataset({"my_var": (["x", "y"], np.random.randn(100, 100))}) Since 'chunk' does not work, manually set encoding ds_original .my_var.encoding["chunks"] = (50, 50) To best showcase the real-life example, write it to file and read it back again. The same could be achieved by just calling .chunk() with chunksizes of 25, but this feels more 'complete' filepath = "~/chunk_test.zarr" ds_original.to_zarr(filepath) ds = xr.open_zarr(filepath) Check the chunksizes and "chunks" encoding print(ds.my_var.chunks) >>> ((50, 50), (50, 50)) print(ds.my_var.encoding["chunks"]) >>> (50, 50) Rechunk the Dataset ds = ds.chunk({"x": 25, "y": 25}) The chunksizes have changed print(ds.my_var.chunks) >>> ((25, 25, 25, 25), (25, 25, 25, 25)) But the encoding value remains the same print(ds.my_var.encoding["chunks"]) >>> (50, 50) Attempting to write this back to zarr raises an error ds.to_zarr("~/chunk_test_rechunked.zarr") NotImplementedError: Specified zarr chunks encoding['chunks']=(50, 50) for variable named 'my_var' would overlap multiple dask chunks ((25, 25, 25, 25), (25, 25, 25, 25)). Writing this array in parallel with dask could lead to corrupted data. Consider either rechunking using `chunk()`, deleting or modifying `encoding['chunks']`, or specify `safe_chunks=False`. ``` MVCE confirmation [X] Minimal example — the example is as focused as reasonably possible to demonstrate the underlying issue in xarray. [X] Complete example — the example is self-contained, including all data and the text of any traceback. [X] Verifiable example — the example copy & pastes into an IPython prompt or Binder notebook, returning the result. [X] New issue — a search of GitHub Issues suggests this is not a duplicate. Relevant log output No response Anything else we need to know? No response Environment INSTALLED VERSIONS ------------------ commit: None python: 3.10.6 (main, May 29 2023, 11:10:38) [GCC 11.3.0] python-bits: 64 OS: Linux OS-release: 5.10.16.3-microsoft-standard-WSL2 machine: x86_64 processor: x86_64 byteorder: little LC_ALL: None LANG: C.UTF-8 LOCALE: ('en_US', 'UTF-8') libhdf5: 1.10.7 libnetcdf: 4.8.1 xarray: 2023.7.0 pandas: 1.5.3 numpy: 1.24.2 scipy: 1.10.0 netCDF4: 1.5.8 pydap: None h5netcdf: 0.12.0 h5py: 3.6.0 Nio: None zarr: 2.14.1 cftime: 1.5.2 nc_time_axis: None PseudoNetCDF: None iris: None bottleneck: 1.3.6 dask: 2022.01.0+dfsg distributed: 2022.01.0+ds.1 matplotlib: 3.5.1 cartopy: None seaborn: None numbagg: None fsspec: 2023.1.0 cupy: None pint: None sparse: None flox: None numpy_groupies: None setuptools: 59.6.0 pip: 23.2.1 conda: None pytest: 7.2.2 mypy: 1.1.1 IPython: 7.31.1 sphinx: None	{ "url": "https://api.github.com/repos/pydata/xarray/issues/8062/reactions", "total_count": 2, "+1": 1, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 1 }			13221727	issue

Links from other tables

1 row from issues_id in issues_labels
0 rows from issue in issue_comments