home / github / issues

Menu
  • Search all tables
  • GraphQL API

issues: 1462295936

This data as json

id node_id number title user state locked assignee milestone comments created_at updated_at closed_at author_association active_lock_reason draft pull_request body reactions performed_via_github_app state_reason repo type
1462295936 I_kwDOAMm_X85XKN2A 7317 Error using to_zarr method with fsspec simplecache 25071375 open 0     0 2022-11-23T19:26:38Z 2022-12-13T16:38:03Z   CONTRIBUTOR      

What happened?

I'm trying to use the fsspec simplecache implementation to read and write a set of Zarr files using Xarray (and some others with Dask) but during the writing process, I always get a KeyError even using the mode="w" (The complete error is on the relevant log output).

I raised a similar issue on Dask https://github.com/dask/dask/issues/9680 but when I tried the same solution of adding the "overwrite" parameter (which I think should be not necessary because Xarray already offers the mode="w" option) through the "storage_options" parameter that the "to_zarr" method offers it raises the following error: "ValueError: store must be a string to use storage_options. Got <class 'fsspec.mapping.FSMap'>", so apparently I can not apply the same solution.

What did you expect to happen?

The "to_zarr" method should be able to store the array even using the simplecache filesystem (at least Dask can).

Minimal Complete Verifiable Example

```Python mapper = fsspec.get_mapper('error_cache_write') cache_mapper = fsspec.filesystem( "simplecache", fs=mapper.fs, cache_storage='cache/files', same_names=True ).get_mapper('error_cache_write')

arr = xr.DataArray( [1, 2, 3], coords={ "test_coord": [5,6,7] } ).to_dataset(name="data")

This erases the cache and the array itself.

cache_mapper.fs.clear_cache() cache_mapper.clear()

arr.to_zarr(cache_mapper, mode="w")

Using the storage_options

arr.to_zarr(cache_mapper, mode="w", storage_options={"overwrite": True}) ```

MVCE confirmation

  • [x] Minimal example — the example is as focused as reasonably possible to demonstrate the underlying issue in xarray.
  • [x] Complete example — the example is self-contained, including all data and the text of any traceback.
  • [x] Verifiable example — the example copy & pastes into an IPython prompt or Binder notebook, returning the result.
  • [x] New issue — a search of GitHub Issues suggests this is not a duplicate.

Relevant log output

```Python KeyError Traceback (most recent call last) Input In [9], in <cell line: 16>() 8 arr = xr.DataArray( 9 [1, 2, 3], 10 coords={ 11 "test_coord": [5,6,7] 12 } 13 ).to_dataset(name="data") 14 # cache_mapper.fs.clear_cache() 15 # cache_mapper.clear() ---> 16 arr.to_zarr(cache_mapper, mode="w")

File /opt/conda/lib/python3.9/site-packages/xarray/core/dataset.py:2081, in Dataset.to_zarr(self, store, chunk_store, mode, synchronizer, group, encoding, compute, consolidated, append_dim, region, safe_chunks, storage_options) 1971 """Write dataset contents to a zarr group. 1972 1973 Zarr chunks are determined in the following way: (...) 2077 The I/O user guide, with more details and examples. 2078 """ 2079 from ..backends.api import to_zarr -> 2081 return to_zarr( # type: ignore 2082 self, 2083 store=store, 2084 chunk_store=chunk_store, 2085 storage_options=storage_options, 2086 mode=mode, 2087 synchronizer=synchronizer, 2088 group=group, 2089 encoding=encoding, 2090 compute=compute, 2091 consolidated=consolidated, 2092 append_dim=append_dim, 2093 region=region, 2094 safe_chunks=safe_chunks, 2095 )

File /opt/conda/lib/python3.9/site-packages/xarray/backends/api.py:1657, in to_zarr(dataset, store, chunk_store, mode, synchronizer, group, encoding, compute, consolidated, append_dim, region, safe_chunks, storage_options) 1655 writer = ArrayWriter() 1656 # TODO: figure out how to properly handle unlimited_dims -> 1657 dump_to_store(dataset, zstore, writer, encoding=encoding) 1658 writes = writer.sync(compute=compute) 1660 if compute:

File /opt/conda/lib/python3.9/site-packages/xarray/backends/api.py:1277, in dump_to_store(dataset, store, writer, encoder, encoding, unlimited_dims) 1274 if encoder: 1275 variables, attrs = encoder(variables, attrs) -> 1277 store.store(variables, attrs, check_encoding, writer, unlimited_dims=unlimited_dims)

File /opt/conda/lib/python3.9/site-packages/xarray/backends/zarr.py:550, in ZarrStore.store(self, variables, attributes, check_encoding_set, writer, unlimited_dims) 548 for vn in existing_variable_names: 549 vars_with_encoding[vn] = variables[vn].copy(deep=False) --> 550 vars_with_encoding[vn].encoding = existing_vars[vn].encoding 551 vars_with_encoding, _ = self.encode(vars_with_encoding, {}) 552 variables_encoded.update(vars_with_encoding)

KeyError: 'data' ```

Anything else we need to know?

No response

Environment

INSTALLED VERSIONS ------------------ commit: None python: 3.9.7 | packaged by conda-forge | (default, Sep 29 2021, 19:20:46) [GCC 9.4.0] python-bits: 64 OS: Linux OS-release: 5.4.0-1065-aws machine: x86_64 processor: x86_64 byteorder: little LC_ALL: en_US.UTF-8 LANG: en_US.UTF-8 LOCALE: ('en_US', 'UTF-8') libhdf5: 1.12.1 libnetcdf: None xarray: 2022.11.0 pandas: 1.4.0 numpy: 1.22.4 scipy: 1.9.1 netCDF4: None pydap: None h5netcdf: None h5py: 3.6.0 Nio: None zarr: 2.13.3 cftime: None nc_time_axis: None PseudoNetCDF: None rasterio: None cfgrib: None iris: None bottleneck: 1.3.2 dask: 2022.11.1 distributed: None matplotlib: 3.5.1 cartopy: None seaborn: 0.11.2 numbagg: None fsspec: 2022.8.0 cupy: None pint: None sparse: None flox: 0.5.9 numpy_groupies: 0.9.19 setuptools: 59.1.1 pip: 22.0.3 conda: 4.11.0 pytest: 7.1.3 IPython: 8.4.0 sphinx: None
{
    "url": "https://api.github.com/repos/pydata/xarray/issues/7317/reactions",
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
    13221727 issue

Links from other tables

  • 2 rows from issues_id in issues_labels
  • 0 rows from issue in issue_comments
Powered by Datasette · Queries took 155.684ms · About: xarray-datasette