id,node_id,number,title,user,state,locked,assignee,milestone,comments,created_at,updated_at,closed_at,author_association,active_lock_reason,draft,pull_request,body,reactions,performed_via_github_app,state_reason,repo,type 1200716594,PR_kwDOAMm_X842CkkU,6476,Fix zarr append dtype checks,62192187,closed,0,,,7,2022-04-12T00:30:34Z,2022-05-11T17:39:42Z,2022-05-11T17:35:10Z,CONTRIBUTOR,,0,pydata/xarray/pulls/6476," - [x] Closes #6345 - [x] Tests added - [ ] User visible changes (including notable bug fixes) are documented in `whats-new.rst` - If this is deemed a ""notable bug fix"" I can add a note here prior to merge. - [ ] New functions/methods are listed in `api.rst` - N/A","{""url"": ""https://api.github.com/repos/pydata/xarray/issues/6476/reactions"", ""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,,13221727,pull 1164454058,I_kwDOAMm_X85FaCiq,6345,`to_zarr` raises `ValueError: Invalid dtype` with `mode='a'` (but not with `mode='w'`),62192187,closed,0,,,6,2022-03-09T21:21:26Z,2022-05-11T17:35:10Z,2022-05-11T17:35:10Z,CONTRIBUTOR,,,,"### What happened? A dataset in which a data variable has `dtype='|S35'` can be written to zarr without error as follows ```python import xarray as xr import numpy as np data = np.zeros((2, 3), dtype='|S35') ds = xr.DataArray(data, name='foo').to_dataset() ds.to_zarr('test.zarr', mode='w') ``` Changing the value of `mode` from `'w'` to `'a'`, raises `ValueError: Invalid dtype for data variable`: ```python !rm -rf test.zarr ds.to_zarr('test.zarr', mode='a') ```
Full Traceback ```python-traceback --------------------------------------------------------------------------- ValueError Traceback (most recent call last) Input In [4], in () ----> 1 ds.to_zarr('test.zarr', mode='a') File ~/miniconda3/envs/pangeo-forge-recipes/lib/python3.9/site-packages/xarray/core/dataset.py:2036, in Dataset.to_zarr(self, store, chunk_store, mode, synchronizer, group, encoding, compute, consolidated, append_dim, region, safe_chunks, storage_options) 2033 if encoding is None: 2034 encoding = {} -> 2036 return to_zarr( 2037 self, 2038 store=store, 2039 chunk_store=chunk_store, 2040 storage_options=storage_options, 2041 mode=mode, 2042 synchronizer=synchronizer, 2043 group=group, 2044 encoding=encoding, 2045 compute=compute, 2046 consolidated=consolidated, 2047 append_dim=append_dim, 2048 region=region, 2049 safe_chunks=safe_chunks, 2050 ) File ~/miniconda3/envs/pangeo-forge-recipes/lib/python3.9/site-packages/xarray/backends/api.py:1406, in to_zarr(dataset, store, chunk_store, mode, synchronizer, group, encoding, compute, consolidated, append_dim, region, safe_chunks, storage_options) 1391 zstore = backends.ZarrStore.open_group( 1392 store=mapper, 1393 mode=mode, (...) 1402 stacklevel=4, # for Dataset.to_zarr() 1403 ) 1405 if mode in [""a"", ""r+""]: -> 1406 _validate_datatypes_for_zarr_append(dataset) 1407 if append_dim is not None: 1408 existing_dims = zstore.get_dimensions() File ~/miniconda3/envs/pangeo-forge-recipes/lib/python3.9/site-packages/xarray/backends/api.py:1301, in _validate_datatypes_for_zarr_append(dataset) 1292 raise ValueError( 1293 ""Invalid dtype for data variable: {} "" 1294 ""dtype must be a subtype of number, "" (...) 1297 ""object"".format(var) 1298 ) 1300 for k in dataset.data_vars.values(): -> 1301 check_dtype(k) File ~/miniconda3/envs/pangeo-forge-recipes/lib/python3.9/site-packages/xarray/backends/api.py:1292, in _validate_datatypes_for_zarr_append..check_dtype(var) 1283 def check_dtype(var): 1284 if ( 1285 not np.issubdtype(var.dtype, np.number) 1286 and not np.issubdtype(var.dtype, np.datetime64) (...) 1290 ): 1291 # and not re.match('^bytes[1-9]+$', var.dtype.name)): -> 1292 raise ValueError( 1293 ""Invalid dtype for data variable: {} "" 1294 ""dtype must be a subtype of number, "" 1295 ""datetime, bool, a fixed sized string, "" 1296 ""a fixed size unicode string or an "" 1297 ""object"".format(var) 1298 ) ValueError: Invalid dtype for data variable: array([[b'', b'', b''], [b'', b'', b'']], dtype='|S35') Dimensions without coordinates: dim_0, dim_1 dtype must be a subtype of number, datetime, bool, a fixed sized string, a fixed size unicode string or an object ```
### What did you expect to happen? I would expect the behavior of `mode='w'` and `mode='a'` to be consistent as regards dtypes of data variables. ### Minimal Complete Verifiable Example See **What Happened?** section above ### Relevant log output See **What Happened?** section above ### Anything else we need to know? _No response_ ### Environment ``` INSTALLED VERSIONS ------------------ commit: None python: 3.9.10 | packaged by conda-forge | (main, Feb 1 2022, 21:28:27) [Clang 11.1.0 ] python-bits: 64 OS: Darwin OS-release: 21.0.1 machine: x86_64 processor: i386 byteorder: little LC_ALL: None LANG: en_US.UTF-8 LOCALE: ('en_US', 'UTF-8') libhdf5: 1.12.1 libnetcdf: 4.8.1 xarray: 2022.3.0 pandas: 1.4.1 numpy: 1.22.2 scipy: 1.8.0 netCDF4: 1.5.8 pydap: installed h5netcdf: 999 h5py: 3.6.0 Nio: None zarr: 2.11.0 cftime: 1.6.0 nc_time_axis: None PseudoNetCDF: None rasterio: 1.2.10 cfgrib: 0.9.8.5 iris: None bottleneck: None dask: 2022.02.1 distributed: 2022.2.1 matplotlib: 3.5.1 cartopy: None seaborn: None numbagg: None fsspec: 2022.02.0 cupy: None pint: None sparse: None setuptools: 59.8.0 pip: 22.0.4 conda: None pytest: 6.2.5 IPython: 8.1.1 sphinx: None ``` cc @rabernat ","{""url"": ""https://api.github.com/repos/pydata/xarray/issues/6345/reactions"", ""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,completed,13221727,issue