id,node_id,number,title,user,state,locked,assignee,milestone,comments,created_at,updated_at,closed_at,author_association,active_lock_reason,draft,pull_request,body,reactions,performed_via_github_app,state_reason,repo,type 789410367,MDU6SXNzdWU3ODk0MTAzNjc=,4826,Reading and writing a zarr dataset multiple times casts bools to int8,463809,closed,0,,,10,2021-01-19T22:02:15Z,2023-04-10T09:26:27Z,2023-04-10T09:26:27Z,CONTRIBUTOR,,,,"**What happened**: Reading and writing zarr dataset multiple times into different paths changes `bool` dtype arrays to `int8`. I think this issue is related to #2937. **What you expected to happen**: My array's dtype in numpy/dask should not change, even if certain storage backends store dtypes a certain way. **Minimal Complete Verifiable Example**: ```python import xarray as xr import numpy as np ds = xr.Dataset({ ""bool_field"": xr.DataArray( np.random.randn(5) < 0.5, dims=('g'), coords={'g': np.arange(5)} ) }) ds.to_zarr('test.zarr', mode=""w"") d2 = xr.open_zarr('test.zarr') print(d2.bool_field.dtype) print(d2.bool_field.encoding) d2.to_zarr(""test2.zarr"", mode=""w"") d3 = xr.open_zarr('test2.zarr') print(d3.bool_field.dtype) ``` The above snippet prints the following. In d3, the dtype of `bool_field` is `int8`, presumably because d3 inherited d2's `encoding` and it says `int8`, despite the array having a `bool` dtype. ``` bool {'chunks': (5,), 'compressor': Blosc(cname='lz4', clevel=5, shuffle=SHUFFLE, blocksize=0), 'filters': None, 'dtype': dtype('int8')} int8 ``` **Anything else we need to know?**: Currently workaround is to explicitly set encodings. This fixes the problem: ```python encoding = {k: {""dtype"": d2[k].dtype} for k in d2} d2.to_zarr('test2.zarr', mode=""w"", encoding=encoding) ``` **Environment**:
Output of xr.show_versions() ``` # I'll update with the the full output of xr.show_versions() soon. In [4]: xr.__version__ Out[4]: '0.16.2' In [2]: zarr.__version__ Out[2]: '2.6.1' ```
","{""url"": ""https://api.github.com/repos/pydata/xarray/issues/4826/reactions"", ""total_count"": 1, ""+1"": 1, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,completed,13221727,issue 516725099,MDU6SXNzdWU1MTY3MjUwOTk=,3480,Allow appending non-numerical types to zarr arrays.,463809,closed,0,,,0,2019-11-02T21:20:53Z,2019-11-13T15:55:33Z,2019-11-13T15:55:33Z,CONTRIBUTOR,,,,"#### MCVE Code Sample Zarr itself allows appending `np.datetime` and `np.bool` types. ```python >>> path = 'tmp/test.zarr' >>> z1 = zarr.open(path, mode='w', shape=(10,), chunks=(10,), dtype='M8[D]') >>> z1[:] = '1990-01-01' >>> z2 = zarr.open(path, mode='a') >>> a = np.array(['1992-01-01'] * 10, dtype='datetime64[D]') >>> z2.append(a) (20,) >>> z2 ``` But it's equivalent in xarray throws an error: ``` >>> ds = xr.Dataset( ... {'y': (('x',), np.array(['1991-01-01'] * 10, dtype='datetime64[D]'))} ... ) >>> ds.to_zarr('tmp/test_xr.zarr', mode='w') >>> ds2 = xr.Dataset( ... {'y': (('x',), np.array(['1992-01-01'] * 10, dtype='datetime64[D]'))} ... ) >>> ds2.to_zarr('tmp/test_xr.zarr', mode='a', append_dim='x') Traceback (most recent call last): File """", line 1, in File ""/Users/personal/opt/anaconda3/lib/python3.7/site-packages/xarray/core/dataset.py"", line 1616, in to_zarr append_dim=append_dim, File ""/Users/personal/opt/anaconda3/lib/python3.7/site-packages/xarray/backends/api.py"", line 1304, in to_zarr _validate_datatypes_for_zarr_append(dataset) File ""/Users/personal/opt/anaconda3/lib/python3.7/site-packages/xarray/backends/api.py"", line 1249, in _validate_datatypes_for_zarr_append check_dtype(k) File ""/Users/personal/opt/anaconda3/lib/python3.7/site-packages/xarray/backends/api.py"", line 1245, in check_dtype ""unicode string or an object"".format(var) ValueError: Invalid dtype for data variable: array(['1992-01-01T00:00:00.000000000', '1992-01-01T00:00:00.000000000', '1992-01-01T00:00:00.000000000', '1992-01-01T00:00:00.000000000', '1992-01-01T00:00:00.000000000', '1992-01-01T00:00:00.000000000', '1992-01-01T00:00:00.000000000', '1992-01-01T00:00:00.000000000', '1992-01-01T00:00:00.000000000', '1992-01-01T00:00:00.000000000'], dtype='datetime64[ns]') Dimensions without coordinates: x dtype must be a subtype of number, a fixed sized string, a fixed size unicode string or an object ``` #### Expected Output The append should succeed. #### Problem Description This function in `xarray/api.py` is too strict on types: ``` def _validate_datatypes_for_zarr_append(dataset): """"""DataArray.name and Dataset keys must be a string or None"""""" def check_dtype(var): if ( not np.issubdtype(var.dtype, np.number) and not coding.strings.is_unicode_dtype(var.dtype) and not var.dtype == object ): # and not re.match('^bytes[1-9]+$', var.dtype.name)): raise ValueError( ""Invalid dtype for data variable: {} "" ""dtype must be a subtype of number, "" ""a fixed sized string, a fixed size "" ""unicode string or an object"".format(var) ) for k in dataset.data_vars.values(): check_dtype(k) ``` `np.datetime64[.]` and `np.bool` are not numbers: ``` >>> np.issubdtype(np.dtype('datetime64[D]'), np.number) False >>> np.issubdtype(np.dtype('bool'), np.number) False ``` #### Output of ``xr.show_versions()``
INSTALLED VERSIONS ------------------ commit: None python: 3.7.4 (default, Aug 13 2019, 15:17:50) [Clang 4.0.1 (tags/RELEASE_401/final)] python-bits: 64 OS: Darwin OS-release: 18.6.0 machine: x86_64 processor: i386 byteorder: little LC_ALL: None LANG: en_US.UTF-8 LOCALE: en_US.UTF-8 libhdf5: 1.10.4 libnetcdf: None xarray: 0.14.0 pandas: 0.25.1 numpy: 1.17.2 scipy: 1.3.1 netCDF4: None pydap: None h5netcdf: None h5py: 2.9.0 Nio: None zarr: 2.3.2 cftime: None nc_time_axis: None PseudoNetCDF: None rasterio: None cfgrib: None iris: None bottleneck: 1.2.1 dask: 2.5.2 distributed: 2.5.2 matplotlib: 3.1.1 cartopy: None seaborn: 0.9.0 numbagg: None setuptools: 41.4.0 pip: 19.2.3 conda: 4.7.12 pytest: 5.2.1 IPython: 7.8.0 sphinx: 2.2.0
","{""url"": ""https://api.github.com/repos/pydata/xarray/issues/3480/reactions"", ""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,completed,13221727,issue