home / github / issues

Menu
  • GraphQL API
  • Search all tables

issues: 868976909

This data as json

id node_id number title user state locked assignee milestone comments created_at updated_at closed_at author_association active_lock_reason draft pull_request body reactions performed_via_github_app state_reason repo type
868976909 MDU6SXNzdWU4Njg5NzY5MDk= 5224 xarray can't append to Zarrs with byte-string variables 9010180 open 0     3 2021-04-27T15:30:44Z 2021-04-28T17:15:43Z   NONE      

What happened:

I tried to use xarray to append a Dataset to a Zarr containing a |S1 (char string) datatype, and received this error:

ValueError: Invalid dtype for data variable: <xarray.DataArray 'x' ()> array(b'', dtype='|S1') dtype must be a subtype of number, datetime, bool, a fixed sized string, a fixed size unicode string or an object

What you expected to happen:

I expected the Dataset to be appended to the Zarr.

Minimal Complete Verifiable Example:

Note: this is not quite "minimal", since it also performs the append using the Zarr library directly and using a |U1 (Unicode) datatype in order to demonstrate that these variations work.

```python import numpy as np import xarray as xr import zarr

def test_append(data_type, zarr_path): print(f"Creating {data_type} Zarr...") ds = xr.Dataset({"x": np.array("", dtype=data_type)}) ds.to_zarr(zarr_path, mode="w")

print(f"Appending to {data_type} Zarr with Zarr library...")
zarr_to_append = zarr.open(zarr_path, mode="a")
zarr_to_append.x.append(np.array("", dtype=data_type))

print(f"Appending to {data_type} Zarr with xarray...")
ds_to_append = xr.Dataset({"x": np.array("", dtype=data_type)})
ds_to_append.to_zarr(zarr_path, mode="a")

test_append("|U1", "test-u.zarr") test_append("|S1", "test-s.zarr")

```

Anything else we need to know?:

I came across this problem when converting some NetCDFs from this dataset to a Zarr, appending them along the time axis. The latest data format vesion (1.4) includes a dimensionless variable crs with type char, which xarray reads as an |S1, causing the error described above when I attempt to append. Replacing crs with a |U1-typed variable works around the problem, but is undesirable since we need to reproduce the NetCDFs as closely as possible. The example above shows that the Zarr format and library themselves don't seem to have a problem with appending byte string variables.

The obvious fix would be to loosen the type check in xarray.backends.api._validate_datatypes_for_zarr_append:

python if ( not np.issubdtype(var.dtype, np.number) and not np.issubdtype(var.dtype, np.datetime64) and not np.issubdtype(var.dtype, np.bool_) and not coding.strings.is_unicode_dtype(var.dtype) and not coding.strings.is_bytes_dtype(var.dtype) # <- this line added to avoid "Invalid dtype" error and not var.dtype == object ):

This change makes the example above work, but I don't know if it would result in any unintended side-effects.

Environment:

Output of <tt>xr.show_versions()</tt> INSTALLED VERSIONS ------------------ commit: 0021cdab91f7466f4be0fb32dae92bf3f8290e19 python: 3.8.5 (default, Jan 27 2021, 15:41:15) [GCC 9.3.0] python-bits: 64 OS: Linux OS-release: 5.8.0-50-generic machine: x86_64 processor: x86_64 byteorder: little LC_ALL: None LANG: en_GB.UTF-8 LOCALE: en_GB.UTF-8 libhdf5: 1.10.4 libnetcdf: 4.7.3 xarray: 0.15.0 pandas: 0.25.3 numpy: 1.17.4 scipy: 1.3.3 netCDF4: 1.5.3 pydap: None h5netcdf: 0.7.1 h5py: 2.10.0 Nio: None zarr: 2.4.0+ds cftime: 1.1.0 nc_time_axis: None PseudoNetCDF: None rasterio: 1.1.3 cfgrib: None iris: None bottleneck: 1.2.1 dask: 2.8.1+dfsg distributed: None matplotlib: 3.1.2 cartopy: None seaborn: None numbagg: None pint: None setuptools: 45.2.0 pip: 20.0.2 conda: None pytest: 4.6.9 IPython: 7.13.0 sphinx: 1.8.5
{
    "url": "https://api.github.com/repos/pydata/xarray/issues/5224/reactions",
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
    13221727 issue

Links from other tables

  • 2 rows from issues_id in issues_labels
  • 3 rows from issue in issue_comments
Powered by Datasette · Queries took 0.825ms · About: xarray-datasette