issues: 1249638836
This data as json
id | node_id | number | title | user | state | locked | assignee | milestone | comments | created_at | updated_at | closed_at | author_association | active_lock_reason | draft | pull_request | body | reactions | performed_via_github_app | state_reason | repo | type |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
1249638836 | I_kwDOAMm_X85Ke_m0 | 6640 | to_zarr fails for large dimensions; sensitive to exact dimension size and chunk size | 12818667 | closed | 0 | 5 | 2022-05-26T14:22:20Z | 2023-10-14T20:29:50Z | 2023-10-14T20:29:49Z | NONE | What happened?Using dask 2022.05.0, zarr 2.11.3 and xarray 2022.3.0, When creating a large empty dataset and trying to save it in the zarr data format with to_zarr, it fails with the following error. Frankly, I am not sure if the problem is with Xarray or Zarr, but as documented in the attached code, when I create the same dataset with Zarr, it works just fine. ``` File ~/anaconda3/envs/py3_parcels_mpi_bleedingApr2022/lib/python3.9/site-packages/zarr/core.py:2101, in Array._decode_chunk(self, cdata, start, nitems, expected_shape) 2099 # ensure correct chunk shape 2100 chunk = chunk.reshape(-1, order='A') -> 2101 chunk = chunk.reshape(expected_shape or self._chunks, order=self._order) 2103 return chunk ValueError: cannot reshape array of size 234506 into shape (235150,) ``` To show that this is not a zarr issue, I have made the same output directly with zarr in the example code below. It is in the "else" clause in the code. Note well: I have included a value of numberOfDrifters that has the problem, and one that does not. Please see the comments where numberOfDrifters is defined. What did you expect to happen?I expected a zarr dataset to be created. I cannot solve the problem with a chunk size of 1 for memory issues. I would prefer to create the zarr dataset with xarray so it has the metadata to be easily loaded into xarray. Minimal Complete Verifiable Example```Python from numpy import * import xarray as xr import dask import zarr dtype=float32 chunkSize=10000 maxNumObs=1 numberOfDrifters=120396431 #2008 This size WORKSnumberOfDrifters=120067029 #2007 This size FAILS if True, make zarr with xarrayif True: #make xarray data set, then write to zarr coords={'traj':(['traj'],arange(numberOfDrifters)),'obs':(['obs'],arange(maxNumObs))} emptyArray=dask.array.empty(shape=(numberOfDrifters,maxNumObs),dtype=dtype,chunks=(chunkSize,maxNumObs)) var='time' data_vars={} attrs={} data_vars[var]=(['traj','obs'],emptyArray,attrs) dataOut=xr.Dataset(data_vars,coords,{}) print('done defining data set, now writing')
else: #make with zarr store=zarr.DirectoryStore('dataPaths/jnk_makeWithZarr.zarr') root=zarr.group(store=store) root.empty(shape=(numberOfDrifters,maxNumObs),name='time',dtype=dtype,chunks=(chunkSize,maxNumObs)) print('done writting') zarrInZarr=zarr.open('dataPaths/jnk_makeWithZarr.zarr','r') print('done opening') ``` MVCE confirmation
Relevant log output
Anything else we need to know?No response Environment
INSTALLED VERSIONS
------------------
commit: None
python: 3.9.12 | packaged by conda-forge | (main, Mar 24 2022, 23:22:55)
[GCC 10.3.0]
python-bits: 64
OS: Linux
OS-release: 5.13.0-41-generic
machine: x86_64
processor: x86_64
byteorder: little
LC_ALL: None
LANG: en_US.UTF-8
LOCALE: ('en_US', 'UTF-8')
libhdf5: 1.12.1
libnetcdf: 4.8.1
xarray: 2022.3.0
pandas: 1.4.1
numpy: 1.20.3
scipy: 1.8.0
netCDF4: 1.5.8
pydap: None
h5netcdf: None
h5py: None
Nio: None
zarr: 2.11.3
cftime: 1.6.0
nc_time_axis: None
PseudoNetCDF: None
rasterio: None
cfgrib: None
iris: None
bottleneck: None
dask: 2022.05.0
distributed: 2022.5.0
matplotlib: 3.5.1
cartopy: 0.20.2
seaborn: None
numbagg: None
fsspec: 2022.02.0
cupy: None
pint: None
sparse: None
setuptools: 61.2.0
pip: 22.0.4
conda: None
pytest: 7.1.1
IPython: 8.2.0
sphinx: None
|
{ "url": "https://api.github.com/repos/pydata/xarray/issues/6640/reactions", "total_count": 1, "+1": 1, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
completed | 13221727 | issue |