home / github / issues

Menu
  • Search all tables
  • GraphQL API

issues: 1249638836

This data as json

id node_id number title user state locked assignee milestone comments created_at updated_at closed_at author_association active_lock_reason draft pull_request body reactions performed_via_github_app state_reason repo type
1249638836 I_kwDOAMm_X85Ke_m0 6640 to_zarr fails for large dimensions; sensitive to exact dimension size and chunk size 12818667 closed 0     5 2022-05-26T14:22:20Z 2023-10-14T20:29:50Z 2023-10-14T20:29:49Z NONE      

What happened?

Using dask 2022.05.0, zarr 2.11.3 and xarray 2022.3.0, When creating a large empty dataset and trying to save it in the zarr data format with to_zarr, it fails with the following error. Frankly, I am not sure if the problem is with Xarray or Zarr, but as documented in the attached code, when I create the same dataset with Zarr, it works just fine.

``` File ~/anaconda3/envs/py3_parcels_mpi_bleedingApr2022/lib/python3.9/site-packages/zarr/core.py:2101, in Array._decode_chunk(self, cdata, start, nitems, expected_shape) 2099 # ensure correct chunk shape 2100 chunk = chunk.reshape(-1, order='A') -> 2101 chunk = chunk.reshape(expected_shape or self._chunks, order=self._order) 2103 return chunk

ValueError: cannot reshape array of size 234506 into shape (235150,) ``` To show that this is not a zarr issue, I have made the same output directly with zarr in the example code below. It is in the "else" clause in the code.

Note well: I have included a value of numberOfDrifters that has the problem, and one that does not. Please see the comments where numberOfDrifters is defined.

What did you expect to happen?

I expected a zarr dataset to be created. I cannot solve the problem with a chunk size of 1 for memory issues. I would prefer to create the zarr dataset with xarray so it has the metadata to be easily loaded into xarray.

Minimal Complete Verifiable Example

```Python from numpy import * import xarray as xr import dask import zarr

dtype=float32 chunkSize=10000 maxNumObs=1

numberOfDrifters=120396431 #2008 This size WORKS

numberOfDrifters=120067029 #2007 This size FAILS

if True, make zarr with xarray

if True: #make xarray data set, then write to zarr coords={'traj':(['traj'],arange(numberOfDrifters)),'obs':(['obs'],arange(maxNumObs))} emptyArray=dask.array.empty(shape=(numberOfDrifters,maxNumObs),dtype=dtype,chunks=(chunkSize,maxNumObs)) var='time' data_vars={} attrs={} data_vars[var]=(['traj','obs'],emptyArray,attrs) dataOut=xr.Dataset(data_vars,coords,{}) print('done defining data set, now writing')

#now save to zarr dataset
dataOut.to_zarr('dataPaths/jnk_makeWithXarray.zarr','w')
print('done writing')
zarrInXarray=zarr.open('dataPaths/jnk_makeWithXarray.zarr','r')
print('done opening')

else: #make with zarr store=zarr.DirectoryStore('dataPaths/jnk_makeWithZarr.zarr') root=zarr.group(store=store) root.empty(shape=(numberOfDrifters,maxNumObs),name='time',dtype=dtype,chunks=(chunkSize,maxNumObs)) print('done writting') zarrInZarr=zarr.open('dataPaths/jnk_makeWithZarr.zarr','r') print('done opening') ```

MVCE confirmation

  • [X] Minimal example — the example is as focused as reasonably possible to demonstrate the underlying issue in xarray.
  • [X] Complete example — the example is self-contained, including all data and the text of any traceback.
  • [X] Verifiable example — the example copy & pastes into an IPython prompt or Binder notebook, returning the result.
  • [X] New issue — a search of GitHub Issues suggests this is not a duplicate.

Relevant log output

Python Traceback (most recent call last): File "/data/plumehome/pringle/workfiles/oceanparcels/makeCommunityConnectivity/breakXarray.py", line 26, in <module> dataOut.to_zarr('dataPaths/jnk_makeWithXarray.zarr','w') File "/home/pringle/anaconda3/envs/py3_parcels_mpi_bleedingApr2022/lib/python3.9/site-packages/xarray/core/dataset.py", line 2036, in to_zarr return to_zarr( File "/home/pringle/anaconda3/envs/py3_parcels_mpi_bleedingApr2022/lib/python3.9/site-packages/xarray/backends/api.py", line 1431, in to_zarr dump_to_store(dataset, zstore, writer, encoding=encoding) File "/home/pringle/anaconda3/envs/py3_parcels_mpi_bleedingApr2022/lib/python3.9/site-packages/xarray/backends/api.py", line 1119, in dump_to_store store.store(variables, attrs, check_encoding, writer, unlimited_dims=unlimited_dims) File "/home/pringle/anaconda3/envs/py3_parcels_mpi_bleedingApr2022/lib/python3.9/site-packages/xarray/backends/zarr.py", line 534, in store self.set_variables( File "/home/pringle/anaconda3/envs/py3_parcels_mpi_bleedingApr2022/lib/python3.9/site-packages/xarray/backends/zarr.py", line 613, in set_variables writer.add(v.data, zarr_array, region) File "/home/pringle/anaconda3/envs/py3_parcels_mpi_bleedingApr2022/lib/python3.9/site-packages/xarray/backends/common.py", line 154, in add target[region] = source File "/home/pringle/anaconda3/envs/py3_parcels_mpi_bleedingApr2022/lib/python3.9/site-packages/zarr/core.py", line 1285, in __setitem__ self.set_basic_selection(pure_selection, value, fields=fields) File "/home/pringle/anaconda3/envs/py3_parcels_mpi_bleedingApr2022/lib/python3.9/site-packages/zarr/core.py", line 1380, in set_basic_selection return self._set_basic_selection_nd(selection, value, fields=fields) File "/home/pringle/anaconda3/envs/py3_parcels_mpi_bleedingApr2022/lib/python3.9/site-packages/zarr/core.py", line 1680, in _set_basic_selection_nd self._set_selection(indexer, value, fields=fields) File "/home/pringle/anaconda3/envs/py3_parcels_mpi_bleedingApr2022/lib/python3.9/site-packages/zarr/core.py", line 1732, in _set_selection self._chunk_setitem(chunk_coords, chunk_selection, chunk_value, fields=fields) File "/home/pringle/anaconda3/envs/py3_parcels_mpi_bleedingApr2022/lib/python3.9/site-packages/zarr/core.py", line 1994, in _chunk_setitem self._chunk_setitem_nosync(chunk_coords, chunk_selection, value, File "/home/pringle/anaconda3/envs/py3_parcels_mpi_bleedingApr2022/lib/python3.9/site-packages/zarr/core.py", line 1999, in _chunk_setitem_nosync cdata = self._process_for_setitem(ckey, chunk_selection, value, fields=fields) File "/home/pringle/anaconda3/envs/py3_parcels_mpi_bleedingApr2022/lib/python3.9/site-packages/zarr/core.py", line 2049, in _process_for_setitem chunk = self._decode_chunk(cdata) File "/home/pringle/anaconda3/envs/py3_parcels_mpi_bleedingApr2022/lib/python3.9/site-packages/zarr/core.py", line 2101, in _decode_chunk chunk = chunk.reshape(expected_shape or self._chunks, order=self._order) ValueError: cannot reshape array of size 234506 into shape (235150,)

Anything else we need to know?

No response

Environment

INSTALLED VERSIONS ------------------ commit: None python: 3.9.12 | packaged by conda-forge | (main, Mar 24 2022, 23:22:55) [GCC 10.3.0] python-bits: 64 OS: Linux OS-release: 5.13.0-41-generic machine: x86_64 processor: x86_64 byteorder: little LC_ALL: None LANG: en_US.UTF-8 LOCALE: ('en_US', 'UTF-8') libhdf5: 1.12.1 libnetcdf: 4.8.1 xarray: 2022.3.0 pandas: 1.4.1 numpy: 1.20.3 scipy: 1.8.0 netCDF4: 1.5.8 pydap: None h5netcdf: None h5py: None Nio: None zarr: 2.11.3 cftime: 1.6.0 nc_time_axis: None PseudoNetCDF: None rasterio: None cfgrib: None iris: None bottleneck: None dask: 2022.05.0 distributed: 2022.5.0 matplotlib: 3.5.1 cartopy: 0.20.2 seaborn: None numbagg: None fsspec: 2022.02.0 cupy: None pint: None sparse: None setuptools: 61.2.0 pip: 22.0.4 conda: None pytest: 7.1.1 IPython: 8.2.0 sphinx: None
{
    "url": "https://api.github.com/repos/pydata/xarray/issues/6640/reactions",
    "total_count": 1,
    "+1": 1,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  completed 13221727 issue

Links from other tables

  • 2 rows from issues_id in issues_labels
  • 4 rows from issue in issue_comments
Powered by Datasette · Queries took 0.752ms · About: xarray-datasette