home / github / issues

Menu
  • Search all tables
  • GraphQL API

issues: 1607155972

This data as json

id node_id number title user state locked assignee milestone comments created_at updated_at closed_at author_association active_lock_reason draft pull_request body reactions performed_via_github_app state_reason repo type
1607155972 I_kwDOAMm_X85fy0EE 7576 Rezarring an opened dataset with object dtype fails due to added filter 24508496 closed 0     2 2023-03-02T16:50:56Z 2023-03-20T15:41:32Z 2023-03-20T15:41:31Z CONTRIBUTOR      

What happened?

I am trying to save an xr.Dataset that I read and processed from another saved zarr file. But it fails with this error

numcodecs/vlen.pyx in numcodecs.vlen.VLenUTF8.encode() TypeError: expected unicode string, found 3 It seems like the first time the dataset is saved, xarray/zarr is adding a VLenUTF8 filter to the encoding of one of the dimensions. If I pop the filters key from the opened dataset I can resave the file.

I can also safely save to netcdf (which makes sense since this encoding is probably ignored then).

What did you expect to happen?

I should be able to open and resave a file to zarr.

Minimal Complete Verifiable Example

```Python import xarray as xr import numpy as np da= xr.DataArray(np.array(['126469-423', '130042-0-10046', '120259-10343'], dtype='object'), dims=['asset'], name='asset')

da.to_dataset().to_zarr('~/Downloads/test.zarr', mode='w')

Fails with the error below

opened = xr.open_zarr('~/Downloads/test.zarr') opened.to_zarr('~/Downloads/test2.zarr', mode='w')

Saves successfully

opened.asset.encoding.pop('filters') opened.to_zarr('~Downloads/test2.zarr', mode='w')

```

MVCE confirmation

  • [X] Minimal example — the example is as focused as reasonably possible to demonstrate the underlying issue in xarray.
  • [X] Complete example — the example is self-contained, including all data and the text of any traceback.
  • [X] Verifiable example — the example copy & pastes into an IPython prompt or Binder notebook, returning the result.
  • [X] New issue — a search of GitHub Issues suggests this is not a duplicate.

Relevant log output

```Python TypeError Traceback (most recent call last) <ipython-input-16-b1f2f1d2b5a0> in <module> 6 opened = xr.open_zarr('~/Downloads/test.zarr') 7 ----> 8 opened.to_zarr('~/Downloads/test2.zarr', mode='w')

~/micromamba/envs/xr/lib/python3.8/site-packages/xarray/core/dataset.py in to_zarr(self, store, chunk_store, mode, synchronizer, group, encoding, compute, consolidated, append_dim, region, safe_chunks, storage_options, zarr_version) 2097 from xarray.backends.api import to_zarr 2098 -> 2099 return to_zarr( # type: ignore 2100 self, 2101 store=store,

~/micromamba/envs/xr/lib/python3.8/site-packages/xarray/backends/api.py in to_zarr(dataset, store, chunk_store, mode, synchronizer, group, encoding, compute, consolidated, append_dim, region, safe_chunks, storage_options, zarr_version) 1668 writer = ArrayWriter() 1669 # TODO: figure out how to properly handle unlimited_dims -> 1670 dump_to_store(dataset, zstore, writer, encoding=encoding) 1671 writes = writer.sync(compute=compute) 1672

~/micromamba/envs/xr/lib/python3.8/site-packages/xarray/backends/api.py in dump_to_store(dataset, store, writer, encoder, encoding, unlimited_dims) 1277 variables, attrs = encoder(variables, attrs) 1278 -> 1279 store.store(variables, attrs, check_encoding, writer, unlimited_dims=unlimited_dims) ... 2112 # check object encoding

numcodecs/vlen.pyx in numcodecs.vlen.VLenUTF8.encode()

TypeError: expected unicode string, found 3 ```

Anything else we need to know?

No response

Environment

INSTALLED VERSIONS ------------------ commit: None python: 3.8.5 (default, Sep 4 2020, 07:30:14) [GCC 7.3.0] python-bits: 64 OS: Linux OS-release: 5.4.0-124-generic machine: x86_64 processor: x86_64 byteorder: little LC_ALL: None LANG: en_US.UTF-8 LOCALE: ('en_US', 'UTF-8') libhdf5: 1.10.6 libnetcdf: 4.7.4 xarray: 2023.1.0 pandas: 1.5.3 numpy: 1.22.4 scipy: 1.4.1 netCDF4: 1.5.4 pydap: None h5netcdf: None h5py: None Nio: None zarr: 2.11.0 cftime: 1.4.1 nc_time_axis: 1.2.0 PseudoNetCDF: None rasterio: None cfgrib: 0.9.8.5 iris: None bottleneck: 1.3.2 dask: 2022.01.1 distributed: 2022.01.1 matplotlib: 3.3.2 cartopy: 0.18.0 seaborn: None numbagg: None fsspec: 0.8.4 cupy: None pint: 0.16.1 sparse: None flox: None numpy_groupies: None setuptools: 50.3.0.post20201006 pip: 20.2.3 conda: None pytest: 7.0.1 mypy: None IPython: 7.18.1 sphinx: None
{
    "url": "https://api.github.com/repos/pydata/xarray/issues/7576/reactions",
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  completed 13221727 issue

Links from other tables

  • 2 rows from issues_id in issues_labels
  • 2 rows from issue in issue_comments
Powered by Datasette · Queries took 3.339ms · About: xarray-datasette