home / github / issues

Menu
  • Search all tables
  • GraphQL API

issues: 1164454058

This data as json

id node_id number title user state locked assignee milestone comments created_at updated_at closed_at author_association active_lock_reason draft pull_request body reactions performed_via_github_app state_reason repo type
1164454058 I_kwDOAMm_X85FaCiq 6345 `to_zarr` raises `ValueError: Invalid dtype` with `mode='a'` (but not with `mode='w'`) 62192187 closed 0     6 2022-03-09T21:21:26Z 2022-05-11T17:35:10Z 2022-05-11T17:35:10Z CONTRIBUTOR      

What happened?

A dataset in which a data variable has dtype='|S35' can be written to zarr without error as follows

```python import xarray as xr import numpy as np

data = np.zeros((2, 3), dtype='|S35') ds = xr.DataArray(data, name='foo').to_dataset() ds.to_zarr('test.zarr', mode='w') Changing the value of `mode` from `'w'` to `'a'`, raises `ValueError: Invalid dtype for data variable`:python !rm -rf test.zarr ds.to_zarr('test.zarr', mode='a') ```

Full Traceback ```python-traceback --------------------------------------------------------------------------- ValueError Traceback (most recent call last) Input In [4], in <cell line: 1>() ----> 1 ds.to_zarr('test.zarr', mode='a') File ~/miniconda3/envs/pangeo-forge-recipes/lib/python3.9/site-packages/xarray/core/dataset.py:2036, in Dataset.to_zarr(self, store, chunk_store, mode, synchronizer, group, encoding, compute, consolidated, append_dim, region, safe_chunks, storage_options) 2033 if encoding is None: 2034 encoding = {} -> 2036 return to_zarr( 2037 self, 2038 store=store, 2039 chunk_store=chunk_store, 2040 storage_options=storage_options, 2041 mode=mode, 2042 synchronizer=synchronizer, 2043 group=group, 2044 encoding=encoding, 2045 compute=compute, 2046 consolidated=consolidated, 2047 append_dim=append_dim, 2048 region=region, 2049 safe_chunks=safe_chunks, 2050 ) File ~/miniconda3/envs/pangeo-forge-recipes/lib/python3.9/site-packages/xarray/backends/api.py:1406, in to_zarr(dataset, store, chunk_store, mode, synchronizer, group, encoding, compute, consolidated, append_dim, region, safe_chunks, storage_options) 1391 zstore = backends.ZarrStore.open_group( 1392 store=mapper, 1393 mode=mode, (...) 1402 stacklevel=4, # for Dataset.to_zarr() 1403 ) 1405 if mode in ["a", "r+"]: -> 1406 _validate_datatypes_for_zarr_append(dataset) 1407 if append_dim is not None: 1408 existing_dims = zstore.get_dimensions() File ~/miniconda3/envs/pangeo-forge-recipes/lib/python3.9/site-packages/xarray/backends/api.py:1301, in _validate_datatypes_for_zarr_append(dataset) 1292 raise ValueError( 1293 "Invalid dtype for data variable: {} " 1294 "dtype must be a subtype of number, " (...) 1297 "object".format(var) 1298 ) 1300 for k in dataset.data_vars.values(): -> 1301 check_dtype(k) File ~/miniconda3/envs/pangeo-forge-recipes/lib/python3.9/site-packages/xarray/backends/api.py:1292, in _validate_datatypes_for_zarr_append.<locals>.check_dtype(var) 1283 def check_dtype(var): 1284 if ( 1285 not np.issubdtype(var.dtype, np.number) 1286 and not np.issubdtype(var.dtype, np.datetime64) (...) 1290 ): 1291 # and not re.match('^bytes[1-9]+$', var.dtype.name)): -> 1292 raise ValueError( 1293 "Invalid dtype for data variable: {} " 1294 "dtype must be a subtype of number, " 1295 "datetime, bool, a fixed sized string, " 1296 "a fixed size unicode string or an " 1297 "object".format(var) 1298 ) ValueError: Invalid dtype for data variable: <xarray.DataArray 'foo' (dim_0: 2, dim_1: 3)> array([[b'', b'', b''], [b'', b'', b'']], dtype='|S35') Dimensions without coordinates: dim_0, dim_1 dtype must be a subtype of number, datetime, bool, a fixed sized string, a fixed size unicode string or an object ```

What did you expect to happen?

I would expect the behavior of mode='w' and mode='a' to be consistent as regards dtypes of data variables.

Minimal Complete Verifiable Example

See What Happened? section above

Relevant log output

See What Happened? section above

Anything else we need to know?

No response

Environment

``` INSTALLED VERSIONS


commit: None python: 3.9.10 | packaged by conda-forge | (main, Feb 1 2022, 21:28:27) [Clang 11.1.0 ] python-bits: 64 OS: Darwin OS-release: 21.0.1 machine: x86_64 processor: i386 byteorder: little LC_ALL: None LANG: en_US.UTF-8 LOCALE: ('en_US', 'UTF-8') libhdf5: 1.12.1 libnetcdf: 4.8.1

xarray: 2022.3.0 pandas: 1.4.1 numpy: 1.22.2 scipy: 1.8.0 netCDF4: 1.5.8 pydap: installed h5netcdf: 999 h5py: 3.6.0 Nio: None zarr: 2.11.0 cftime: 1.6.0 nc_time_axis: None PseudoNetCDF: None rasterio: 1.2.10 cfgrib: 0.9.8.5 iris: None bottleneck: None dask: 2022.02.1 distributed: 2022.2.1 matplotlib: 3.5.1 cartopy: None seaborn: None numbagg: None fsspec: 2022.02.0 cupy: None pint: None sparse: None setuptools: 59.8.0 pip: 22.0.4 conda: None pytest: 6.2.5 IPython: 8.1.1 sphinx: None ``` cc @rabernat

{
    "url": "https://api.github.com/repos/pydata/xarray/issues/6345/reactions",
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  completed 13221727 issue

Links from other tables

  • 2 rows from issues_id in issues_labels
  • 6 rows from issue in issue_comments
Powered by Datasette · Queries took 1.434ms · About: xarray-datasette