issues: 1249638836

This data as json

id	node_id	number	title	user	state	locked	assignee	milestone	comments	created_at	updated_at	closed_at	author_association	active_lock_reason	draft	pull_request	body	reactions	performed_via_github_app	state_reason	repo	type
1249638836	I_kwDOAMm_X85Ke_m0	6640	to_zarr fails for large dimensions; sensitive to exact dimension size and chunk size	12818667	closed	0			5	2022-05-26T14:22:20Z	2023-10-14T20:29:50Z	2023-10-14T20:29:49Z	NONE				What happened? Using dask 2022.05.0, zarr 2.11.3 and xarray 2022.3.0, When creating a large empty dataset and trying to save it in the zarr data format with to_zarr, it fails with the following error. Frankly, I am not sure if the problem is with Xarray or Zarr, but as documented in the attached code, when I create the same dataset with Zarr, it works just fine. ``` File ~/anaconda3/envs/py3_parcels_mpi_bleedingApr2022/lib/python3.9/site-packages/zarr/core.py:2101, in Array._decode_chunk(self, cdata, start, nitems, expected_shape) 2099 # ensure correct chunk shape 2100 chunk = chunk.reshape(-1, order='A') -> 2101 chunk = chunk.reshape(expected_shape or self._chunks, order=self._order) 2103 return chunk ValueError: cannot reshape array of size 234506 into shape (235150,) ``` To show that this is not a zarr issue, I have made the same output directly with zarr in the example code below. It is in the "else" clause in the code. Note well: I have included a value of numberOfDrifters that has the problem, and one that does not. Please see the comments where numberOfDrifters is defined. What did you expect to happen? I expected a zarr dataset to be created. I cannot solve the problem with a chunk size of 1 for memory issues. I would prefer to create the zarr dataset with xarray so it has the metadata to be easily loaded into xarray. Minimal Complete Verifiable Example ```Python from numpy import * import xarray as xr import dask import zarr dtype=float32 chunkSize=10000 maxNumObs=1 numberOfDrifters=120396431 #2008 This size WORKS numberOfDrifters=120067029 #2007 This size FAILS if True, make zarr with xarray if True: #make xarray data set, then write to zarr coords={'traj':(['traj'],arange(numberOfDrifters)),'obs':(['obs'],arange(maxNumObs))} emptyArray=dask.array.empty(shape=(numberOfDrifters,maxNumObs),dtype=dtype,chunks=(chunkSize,maxNumObs)) var='time' data_vars={} attrs={} data_vars[var]=(['traj','obs'],emptyArray,attrs) dataOut=xr.Dataset(data_vars,coords,{}) print('done defining data set, now writing') `#now save to zarr dataset dataOut.to_zarr('dataPaths/jnk_makeWithXarray.zarr','w') print('done writing') zarrInXarray=zarr.open('dataPaths/jnk_makeWithXarray.zarr','r') print('done opening')` else: #make with zarr store=zarr.DirectoryStore('dataPaths/jnk_makeWithZarr.zarr') root=zarr.group(store=store) root.empty(shape=(numberOfDrifters,maxNumObs),name='time',dtype=dtype,chunks=(chunkSize,maxNumObs)) print('done writting') zarrInZarr=zarr.open('dataPaths/jnk_makeWithZarr.zarr','r') print('done opening') ``` MVCE confirmation [X] Minimal example — the example is as focused as reasonably possible to demonstrate the underlying issue in xarray. [X] Complete example — the example is self-contained, including all data and the text of any traceback. [X] Verifiable example — the example copy & pastes into an IPython prompt or Binder notebook, returning the result. [X] New issue — a search of GitHub Issues suggests this is not a duplicate. Relevant log output Python Traceback (most recent call last): File "/data/plumehome/pringle/workfiles/oceanparcels/makeCommunityConnectivity/breakXarray.py", line 26, in <module> dataOut.to_zarr('dataPaths/jnk_makeWithXarray.zarr','w') File "/home/pringle/anaconda3/envs/py3_parcels_mpi_bleedingApr2022/lib/python3.9/site-packages/xarray/core/dataset.py", line 2036, in to_zarr return to_zarr( File "/home/pringle/anaconda3/envs/py3_parcels_mpi_bleedingApr2022/lib/python3.9/site-packages/xarray/backends/api.py", line 1431, in to_zarr dump_to_store(dataset, zstore, writer, encoding=encoding) File "/home/pringle/anaconda3/envs/py3_parcels_mpi_bleedingApr2022/lib/python3.9/site-packages/xarray/backends/api.py", line 1119, in dump_to_store store.store(variables, attrs, check_encoding, writer, unlimited_dims=unlimited_dims) File "/home/pringle/anaconda3/envs/py3_parcels_mpi_bleedingApr2022/lib/python3.9/site-packages/xarray/backends/zarr.py", line 534, in store self.set_variables( File "/home/pringle/anaconda3/envs/py3_parcels_mpi_bleedingApr2022/lib/python3.9/site-packages/xarray/backends/zarr.py", line 613, in set_variables writer.add(v.data, zarr_array, region) File "/home/pringle/anaconda3/envs/py3_parcels_mpi_bleedingApr2022/lib/python3.9/site-packages/xarray/backends/common.py", line 154, in add target[region] = source File "/home/pringle/anaconda3/envs/py3_parcels_mpi_bleedingApr2022/lib/python3.9/site-packages/zarr/core.py", line 1285, in __setitem__ self.set_basic_selection(pure_selection, value, fields=fields) File "/home/pringle/anaconda3/envs/py3_parcels_mpi_bleedingApr2022/lib/python3.9/site-packages/zarr/core.py", line 1380, in set_basic_selection return self._set_basic_selection_nd(selection, value, fields=fields) File "/home/pringle/anaconda3/envs/py3_parcels_mpi_bleedingApr2022/lib/python3.9/site-packages/zarr/core.py", line 1680, in _set_basic_selection_nd self._set_selection(indexer, value, fields=fields) File "/home/pringle/anaconda3/envs/py3_parcels_mpi_bleedingApr2022/lib/python3.9/site-packages/zarr/core.py", line 1732, in _set_selection self._chunk_setitem(chunk_coords, chunk_selection, chunk_value, fields=fields) File "/home/pringle/anaconda3/envs/py3_parcels_mpi_bleedingApr2022/lib/python3.9/site-packages/zarr/core.py", line 1994, in _chunk_setitem self._chunk_setitem_nosync(chunk_coords, chunk_selection, value, File "/home/pringle/anaconda3/envs/py3_parcels_mpi_bleedingApr2022/lib/python3.9/site-packages/zarr/core.py", line 1999, in _chunk_setitem_nosync cdata = self._process_for_setitem(ckey, chunk_selection, value, fields=fields) File "/home/pringle/anaconda3/envs/py3_parcels_mpi_bleedingApr2022/lib/python3.9/site-packages/zarr/core.py", line 2049, in _process_for_setitem chunk = self._decode_chunk(cdata) File "/home/pringle/anaconda3/envs/py3_parcels_mpi_bleedingApr2022/lib/python3.9/site-packages/zarr/core.py", line 2101, in _decode_chunk chunk = chunk.reshape(expected_shape or self._chunks, order=self._order) ValueError: cannot reshape array of size 234506 into shape (235150,) Anything else we need to know? No response Environment INSTALLED VERSIONS ------------------ commit: None python: 3.9.12 \| packaged by conda-forge \| (main, Mar 24 2022, 23:22:55) [GCC 10.3.0] python-bits: 64 OS: Linux OS-release: 5.13.0-41-generic machine: x86_64 processor: x86_64 byteorder: little LC_ALL: None LANG: en_US.UTF-8 LOCALE: ('en_US', 'UTF-8') libhdf5: 1.12.1 libnetcdf: 4.8.1 xarray: 2022.3.0 pandas: 1.4.1 numpy: 1.20.3 scipy: 1.8.0 netCDF4: 1.5.8 pydap: None h5netcdf: None h5py: None Nio: None zarr: 2.11.3 cftime: 1.6.0 nc_time_axis: None PseudoNetCDF: None rasterio: None cfgrib: None iris: None bottleneck: None dask: 2022.05.0 distributed: 2022.5.0 matplotlib: 3.5.1 cartopy: 0.20.2 seaborn: None numbagg: None fsspec: 2022.02.0 cupy: None pint: None sparse: None setuptools: 61.2.0 pip: 22.0.4 conda: None pytest: 7.1.1 IPython: 8.2.0 sphinx: None	{ "url": "https://api.github.com/repos/pydata/xarray/issues/6640/reactions", "total_count": 1, "+1": 1, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }		completed	13221727	issue

Links from other tables

2 rows from issues_id in issues_labels
4 rows from issue in issue_comments