home / github / issues

Menu
  • GraphQL API
  • Search all tables

issues: 1876858952

This data as json

id node_id number title user state locked assignee milestone comments created_at updated_at closed_at author_association active_lock_reason draft pull_request body reactions performed_via_github_app state_reason repo type
1876858952 I_kwDOAMm_X85v3phI 8134 Unable to append data in s3 bucket with to_zarr() and append mode 27021858 closed 0     2 2023-09-01T06:57:32Z 2023-09-01T16:03:50Z 2023-09-01T16:03:49Z NONE      

What happened?

I updated my packages and now xarray+zarr are unable to append data to an existing Zarr store in s3.

What did you expect to happen?

That data will be appended to an existing Zarr store.

Minimal Complete Verifiable Example

```Python import s3fs import xarray import numpy as np from datetime import datetime from s3fs import S3FileSystem

append_dim = 'dt_calc' consolidated = True

ds = xarray.Dataset( {'temp': (('dt_calc', 'y', 'x'), np.array([[[1., 2., 3., 4.], [3., 4., 5., 6.]]]))}, coords={'lon': ('y', np.array([50., 51.])), 'lat': ('x', np.array([4., 5., 6., 7.])), 'dt_calc': ('dt_calc', [datetime(2022, 1, 1)])} ) ds_2 = xarray.Dataset( {'temp': (('dt_calc', 'y', 'x'), np.array([[[1., 2., 3., 4.], [3., 4., 5., 6.]]]))}, coords={'lon': ('y', np.array([50., 51.])), 'lat': ('x', np.array([4., 5., 6., 7.])), 'dt_calc': ('dt_calc', [datetime(2022, 1, 1, 1)])} )

s3_out = S3FileSystem( anon=False, s3_additional_kwargs={"StorageClass": storage_class}, ) store_out = s3fs.S3Map( root=f"s3:///{bucket_name}/{dataset_name}.zarr", s3=s3_out, check=False )

ds.to_zarr( store, mode="w-", compute=True, consolidated=consolidated )

try: ds_2.to_zarr( store, mode="w-", compute=True, consolidated=consolidated ) except zarr.errors.ContainsGroupError: ds_2.to_zarr( store, mode="a", append_dim=append_dim, compute=True, consolidated=consolidated, ) ```

MVCE confirmation

  • [ ] Minimal example — the example is as focused as reasonably possible to demonstrate the underlying issue in xarray.
  • [ ] Complete example — the example is self-contained, including all data and the text of any traceback.
  • [ ] Verifiable example — the example copy & pastes into an IPython prompt or Binder notebook, returning the result.
  • [ ] New issue — a search of GitHub Issues suggests this is not a duplicate.

Relevant log output

```Python In [6]: xarray.open_zarr(store_out, consolidated=True) Out[6]: <xarray.Dataset> Dimensions: (dt_calc: 1, x: 4, y: 2) Coordinates: * dt_calc (dt_calc) datetime64[ns] 2022-01-01 lat (x) float64 dask.array<chunksize=(4,), meta=np.ndarray> lon (y) float64 dask.array<chunksize=(2,), meta=np.ndarray> Dimensions without coordinates: x, y Data variables: temp (dt_calc, y, x) float64 dask.array<chunksize=(1, 2, 4), meta=np.ndarray>

In [7]: dataset.to_zarr( ...: store_out, ...: mode="a", ...: append_dim=append_dim, ...: compute=True, ...: consolidated=consolidated, ...: )


ValueError Traceback (most recent call last) Cell In[7], line 1 ----> 1 dataset.to_zarr( 2 store_out, 3 mode="a", 4 append_dim=append_dim, 5 compute=True, 6 consolidated=consolidated, 7 )

File /usr/local/lib/python3.9/site-packages/xarray/core/dataset.py:2461, in Dataset.to_zarr(self, store, chunk_store, mode, synchronizer, group, encoding, compute, consolidated, append_dim, region, safe_chunks, storage_options, zarr_version, write_empty_chunks, chunkmanager_store_kwargs) 2329 """Write dataset contents to a zarr group. 2330 2331 Zarr chunks are determined in the following way: (...) 2457 The I/O user guide, with more details and examples. 2458 """ 2459 from xarray.backends.api import to_zarr -> 2461 return to_zarr( # type: ignore[call-overload,misc] 2462 self, 2463 store=store, 2464 chunk_store=chunk_store, 2465 storage_options=storage_options, 2466 mode=mode, 2467 synchronizer=synchronizer, 2468 group=group, 2469 encoding=encoding, 2470 compute=compute, 2471 consolidated=consolidated, 2472 append_dim=append_dim, 2473 region=region, 2474 safe_chunks=safe_chunks, 2475 zarr_version=zarr_version, 2476 write_empty_chunks=write_empty_chunks, 2477 chunkmanager_store_kwargs=chunkmanager_store_kwargs, 2478 )

File /usr/local/lib/python3.9/site-packages/xarray/backends/api.py:1670, in to_zarr(dataset, store, chunk_store, mode, synchronizer, group, encoding, compute, consolidated, append_dim, region, safe_chunks, storage_options, zarr_version, write_empty_chunks, chunkmanager_store_kwargs) 1668 existing_dims = zstore.get_dimensions() 1669 if append_dim not in existing_dims: -> 1670 raise ValueError( 1671 f"append_dim={append_dim!r} does not match any existing " 1672 f"dataset dimensions {existing_dims}" 1673 ) 1674 existing_var_names = set(zstore.zarr_group.array_keys()) 1675 for var_name in existing_var_names:

ValueError: append_dim='dt_calc' does not match any existing dataset dimensions {}

In [8]: dataset Out[8]: <xarray.Dataset> Dimensions: (dt_calc: 1, y: 2, x: 4) Coordinates: lon (y) float64 50.0 51.0 lat (x) float64 4.0 5.0 6.0 7.0 * dt_calc (dt_calc) datetime64[ns] 2022-01-01T01:00:00 Dimensions without coordinates: y, x Data variables: temp (dt_calc, y, x) float64 1.0 2.0 3.0 4.0 3.0 4.0 5.0 6.0

In [9]: ```

Anything else we need to know?

No response

Environment

INSTALLED VERSIONS ------------------ commit: None python: 3.9.10 (main, Mar 2 2022, 04:31:58) [GCC 10.2.1 20210110] python-bits: 64 OS: Linux OS-release: 6.2.0-26-generic machine: x86_64 processor: byteorder: little LC_ALL: None LANG: C.UTF-8 LOCALE: ('en_US', 'UTF-8') libhdf5: None libnetcdf: None xarray: 2023.8.0 pandas: 2.1.0 numpy: 1.25.2 scipy: 1.10.1 netCDF4: None pydap: None h5netcdf: None h5py: None Nio: None zarr: 2.16.1 cftime: None nc_time_axis: None PseudoNetCDF: None iris: None bottleneck: None dask: 2023.8.1 distributed: 2023.8.1 matplotlib: None cartopy: None seaborn: None numbagg: None fsspec: 2023.6.0 cupy: None pint: None sparse: None flox: None numpy_groupies: None setuptools: 53.0.0 pip: 21.2.4 conda: None pytest: 6.1.1 mypy: None IPython: 8.12.0 sphinx: None

boto3==1.26.45 aiobotocore==2.5.0 botocore==1.29.76 s3fs==2023.6.0 zarr==2.16.1 xarray==2023.8.0 dask==2023.8.1 dask[distributed]==2023.8.1 dask-cloudprovider==2022.10.0

{
    "url": "https://api.github.com/repos/pydata/xarray/issues/8134/reactions",
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  completed 13221727 issue

Links from other tables

  • 2 rows from issues_id in issues_labels
  • 0 rows from issue in issue_comments
Powered by Datasette · Queries took 241.225ms · About: xarray-datasette