issues: 1876858952
This data as json
id | node_id | number | title | user | state | locked | assignee | milestone | comments | created_at | updated_at | closed_at | author_association | active_lock_reason | draft | pull_request | body | reactions | performed_via_github_app | state_reason | repo | type |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
1876858952 | I_kwDOAMm_X85v3phI | 8134 | Unable to append data in s3 bucket with to_zarr() and append mode | 27021858 | closed | 0 | 2 | 2023-09-01T06:57:32Z | 2023-09-01T16:03:50Z | 2023-09-01T16:03:49Z | NONE | What happened?I updated my packages and now xarray+zarr are unable to append data to an existing Zarr store in s3. What did you expect to happen?That data will be appended to an existing Zarr store. Minimal Complete Verifiable Example```Python import s3fs import xarray import numpy as np from datetime import datetime from s3fs import S3FileSystem append_dim = 'dt_calc' consolidated = True ds = xarray.Dataset( {'temp': (('dt_calc', 'y', 'x'), np.array([[[1., 2., 3., 4.], [3., 4., 5., 6.]]]))}, coords={'lon': ('y', np.array([50., 51.])), 'lat': ('x', np.array([4., 5., 6., 7.])), 'dt_calc': ('dt_calc', [datetime(2022, 1, 1)])} ) ds_2 = xarray.Dataset( {'temp': (('dt_calc', 'y', 'x'), np.array([[[1., 2., 3., 4.], [3., 4., 5., 6.]]]))}, coords={'lon': ('y', np.array([50., 51.])), 'lat': ('x', np.array([4., 5., 6., 7.])), 'dt_calc': ('dt_calc', [datetime(2022, 1, 1, 1)])} ) s3_out = S3FileSystem( anon=False, s3_additional_kwargs={"StorageClass": storage_class}, ) store_out = s3fs.S3Map( root=f"s3:///{bucket_name}/{dataset_name}.zarr", s3=s3_out, check=False ) ds.to_zarr( store, mode="w-", compute=True, consolidated=consolidated ) try: ds_2.to_zarr( store, mode="w-", compute=True, consolidated=consolidated ) except zarr.errors.ContainsGroupError: ds_2.to_zarr( store, mode="a", append_dim=append_dim, compute=True, consolidated=consolidated, ) ``` MVCE confirmation
Relevant log output```Python In [6]: xarray.open_zarr(store_out, consolidated=True) Out[6]: <xarray.Dataset> Dimensions: (dt_calc: 1, x: 4, y: 2) Coordinates: * dt_calc (dt_calc) datetime64[ns] 2022-01-01 lat (x) float64 dask.array<chunksize=(4,), meta=np.ndarray> lon (y) float64 dask.array<chunksize=(2,), meta=np.ndarray> Dimensions without coordinates: x, y Data variables: temp (dt_calc, y, x) float64 dask.array<chunksize=(1, 2, 4), meta=np.ndarray> In [7]: dataset.to_zarr( ...: store_out, ...: mode="a", ...: append_dim=append_dim, ...: compute=True, ...: consolidated=consolidated, ...: ) ValueError Traceback (most recent call last) Cell In[7], line 1 ----> 1 dataset.to_zarr( 2 store_out, 3 mode="a", 4 append_dim=append_dim, 5 compute=True, 6 consolidated=consolidated, 7 ) File /usr/local/lib/python3.9/site-packages/xarray/core/dataset.py:2461, in Dataset.to_zarr(self, store, chunk_store, mode, synchronizer, group, encoding, compute, consolidated, append_dim, region, safe_chunks, storage_options, zarr_version, write_empty_chunks, chunkmanager_store_kwargs) 2329 """Write dataset contents to a zarr group. 2330 2331 Zarr chunks are determined in the following way: (...) 2457 The I/O user guide, with more details and examples. 2458 """ 2459 from xarray.backends.api import to_zarr -> 2461 return to_zarr( # type: ignore[call-overload,misc] 2462 self, 2463 store=store, 2464 chunk_store=chunk_store, 2465 storage_options=storage_options, 2466 mode=mode, 2467 synchronizer=synchronizer, 2468 group=group, 2469 encoding=encoding, 2470 compute=compute, 2471 consolidated=consolidated, 2472 append_dim=append_dim, 2473 region=region, 2474 safe_chunks=safe_chunks, 2475 zarr_version=zarr_version, 2476 write_empty_chunks=write_empty_chunks, 2477 chunkmanager_store_kwargs=chunkmanager_store_kwargs, 2478 ) File /usr/local/lib/python3.9/site-packages/xarray/backends/api.py:1670, in to_zarr(dataset, store, chunk_store, mode, synchronizer, group, encoding, compute, consolidated, append_dim, region, safe_chunks, storage_options, zarr_version, write_empty_chunks, chunkmanager_store_kwargs) 1668 existing_dims = zstore.get_dimensions() 1669 if append_dim not in existing_dims: -> 1670 raise ValueError( 1671 f"append_dim={append_dim!r} does not match any existing " 1672 f"dataset dimensions {existing_dims}" 1673 ) 1674 existing_var_names = set(zstore.zarr_group.array_keys()) 1675 for var_name in existing_var_names: ValueError: append_dim='dt_calc' does not match any existing dataset dimensions {} In [8]: dataset Out[8]: <xarray.Dataset> Dimensions: (dt_calc: 1, y: 2, x: 4) Coordinates: lon (y) float64 50.0 51.0 lat (x) float64 4.0 5.0 6.0 7.0 * dt_calc (dt_calc) datetime64[ns] 2022-01-01T01:00:00 Dimensions without coordinates: y, x Data variables: temp (dt_calc, y, x) float64 1.0 2.0 3.0 4.0 3.0 4.0 5.0 6.0 In [9]: ``` Anything else we need to know?No response Environment
INSTALLED VERSIONS
------------------
commit: None
python: 3.9.10 (main, Mar 2 2022, 04:31:58)
[GCC 10.2.1 20210110]
python-bits: 64
OS: Linux
OS-release: 6.2.0-26-generic
machine: x86_64
processor:
byteorder: little
LC_ALL: None
LANG: C.UTF-8
LOCALE: ('en_US', 'UTF-8')
libhdf5: None
libnetcdf: None
xarray: 2023.8.0
pandas: 2.1.0
numpy: 1.25.2
scipy: 1.10.1
netCDF4: None
pydap: None
h5netcdf: None
h5py: None
Nio: None
zarr: 2.16.1
cftime: None
nc_time_axis: None
PseudoNetCDF: None
iris: None
bottleneck: None
dask: 2023.8.1
distributed: 2023.8.1
matplotlib: None
cartopy: None
seaborn: None
numbagg: None
fsspec: 2023.6.0
cupy: None
pint: None
sparse: None
flox: None
numpy_groupies: None
setuptools: 53.0.0
pip: 21.2.4
conda: None
pytest: 6.1.1
mypy: None
IPython: 8.12.0
sphinx: None
boto3==1.26.45 aiobotocore==2.5.0 botocore==1.29.76 s3fs==2023.6.0 zarr==2.16.1 xarray==2023.8.0 dask==2023.8.1 dask[distributed]==2023.8.1 dask-cloudprovider==2022.10.0 |
{ "url": "https://api.github.com/repos/pydata/xarray/issues/8134/reactions", "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
completed | 13221727 | issue |