id,node_id,number,title,user,state,locked,assignee,milestone,comments,created_at,updated_at,closed_at,author_association,active_lock_reason,draft,pull_request,body,reactions,performed_via_github_app,state_reason,repo,type 1160062673,I_kwDOAMm_X85FJSbR,6333,Expressing dimension's preferred chunks as tuple of integers causes TypeError,38358698,closed,0,,,0,2022-03-04T21:23:21Z,2022-04-08T17:18:50Z,2022-04-08T17:18:50Z,CONTRIBUTOR,,,,"### What happened? When opening a dataset containing a variable that has preferred chunks expressed along some dimension as a tuple of integers, *xarray* raises a `TypeError`. ### What did you expect to happen? I expected to open the dataset with its preferred chunks, as described in the documentation on preferred chunks within ""How to add a new backend"". ### Minimal Complete Verifiable Example ```Python import xarray as xr class PassThroughBackendEntrypoint(xr.backends.BackendEntrypoint): def open_dataset(self, dataset, *, drop_variables=None): return dataset initial = xr.Dataset( { ""data"": xr.Variable( (""dim"",), [0, 0], encoding={""preferred_chunks"": {""dim"": (1, 1)}} ) } ) final = xr.open_dataset(initial, engine=PassThroughBackendEntrypoint, chunks={}) ``` ### Relevant log output ```Python [Paths simplified.] Traceback (most recent call last): File """", line 1, in File ""...\xarray\backends\api.py"", line 501, in open_dataset ds = _dataset_from_backend_dataset( File ""...\xarray\backends\api.py"", line 317, in _dataset_from_backend_dataset ds = _chunk_ds( File ""...\xarray\backends\api.py"", line 287, in _chunk_ds var_chunks = _get_chunk(var, chunks) File ""...\xarray\core\dataset.py"", line 409, in _get_chunk _check_chunks_compatibility(var, output_chunks, preferred_chunks) File ""...\xarray\core\dataset.py"", line 371, in _check_chunks_compatibility if any(s % preferred_chunks_dim for s in chunks_dim): File ""...\xarray\core\dataset.py"", line 371, in if any(s % preferred_chunks_dim for s in chunks_dim): TypeError: unsupported operand type(s) for %: 'int' and 'tuple' ``` ### Anything else we need to know? The behavior exhibited above touches on the following related issues: * The `_check_chunks_compatibility` function assumes that a dimension expresses its preferred chunks only as an integer, not a sequence of integers. In contrast, *Dask* will handle either within the `previous_chunks` argument to its `normalize_chunks` function. * The examples in the documentation of `""preferred_chunks""` mappings, namely `{“dim1”: 1000, “dim2”: 2000}` and `{“dim1”: [1000, 100], “dim2”: [2000, 2000, 2000]]}`, have syntax errors: The quotation marks are curly instead of straight, and the second example has an extra closing bracket. * After correcting the syntax errors, the lists in the second example lead to `TypeError: unhashable type: 'list'`. *Dask* raises the exception when it tries to test a mutable list for set membership, as in the following (with simplified paths): ```python >>> dask.array.core.normalize_chunks([[1000, 100], [2000, 2000, 2000]], (1100, 6000)) Traceback (most recent call last): File """", line 1, in File ""...\dask\array\core.py"", line 2900, in normalize_chunks chunks = tuple(c if c not in {None, -1} else s for c, s in zip(chunks, shape)) File ""...\dask\array\core.py"", line 2900, in chunks = tuple(c if c not in {None, -1} else s for c, s in zip(chunks, shape)) TypeError: unhashable type: 'list' ``` If one omits the second argument (the shape) to that call, it succeeds. This may be a bug in *Dask*. * The tests in *xarray* don't exercise behaviors related to preferred chunks. [Edited for grammar.] ### Environment ``` INSTALLED VERSIONS ------------------ commit: None python: 3.8.12 | packaged by conda-forge | (default, Oct 12 2021, 21:22:46) [MSC v.1916 64 bit (AMD64)] python-bits: 64 OS: Windows OS-release: 10 machine: AMD64 processor: Intel64 Family 6 Model 94 Stepping 3, GenuineIntel byteorder: little LC_ALL: None LANG: en_US.UTF-8 LOCALE: ('English_United States', '1252') libhdf5: 1.12.1 libnetcdf: 4.8.1 xarray: 0.20.3.dev52+gd3b6aa6d pandas: 1.4.1 numpy: 1.21.5 scipy: 1.8.0 netCDF4: 1.5.8 pydap: installed h5netcdf: 0.13.1 h5py: 3.6.0 Nio: None zarr: 2.11.0 cftime: 1.5.2 nc_time_axis: 1.4.0 PseudoNetCDF: installed rasterio: 1.2.10 cfgrib: None iris: 3.2.0.post0 bottleneck: 1.3.2 dask: 2022.02.0 distributed: 2022.02.0 matplotlib: 3.5.1 cartopy: 0.20.2 seaborn: 0.11.2 numbagg: 0.2.1 fsspec: 2022.01.0 cupy: None pint: 0.18 sparse: 0.13.0 setuptools: 59.8.0 pip: 22.0.3 conda: None pytest: 7.0.1 IPython: None sphinx: None ```","{""url"": ""https://api.github.com/repos/pydata/xarray/issues/6333/reactions"", ""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,completed,13221727,issue