id,node_id,number,title,user,state,locked,assignee,milestone,comments,created_at,updated_at,closed_at,author_association,active_lock_reason,draft,pull_request,body,reactions,performed_via_github_app,state_reason,repo,type 789410367,MDU6SXNzdWU3ODk0MTAzNjc=,4826,Reading and writing a zarr dataset multiple times casts bools to int8,463809,closed,0,,,10,2021-01-19T22:02:15Z,2023-04-10T09:26:27Z,2023-04-10T09:26:27Z,CONTRIBUTOR,,,,"**What happened**: Reading and writing zarr dataset multiple times into different paths changes `bool` dtype arrays to `int8`. I think this issue is related to #2937. **What you expected to happen**: My array's dtype in numpy/dask should not change, even if certain storage backends store dtypes a certain way. **Minimal Complete Verifiable Example**: ```python import xarray as xr import numpy as np ds = xr.Dataset({ ""bool_field"": xr.DataArray( np.random.randn(5) < 0.5, dims=('g'), coords={'g': np.arange(5)} ) }) ds.to_zarr('test.zarr', mode=""w"") d2 = xr.open_zarr('test.zarr') print(d2.bool_field.dtype) print(d2.bool_field.encoding) d2.to_zarr(""test2.zarr"", mode=""w"") d3 = xr.open_zarr('test2.zarr') print(d3.bool_field.dtype) ``` The above snippet prints the following. In d3, the dtype of `bool_field` is `int8`, presumably because d3 inherited d2's `encoding` and it says `int8`, despite the array having a `bool` dtype. ``` bool {'chunks': (5,), 'compressor': Blosc(cname='lz4', clevel=5, shuffle=SHUFFLE, blocksize=0), 'filters': None, 'dtype': dtype('int8')} int8 ``` **Anything else we need to know?**: Currently workaround is to explicitly set encodings. This fixes the problem: ```python encoding = {k: {""dtype"": d2[k].dtype} for k in d2} d2.to_zarr('test2.zarr', mode=""w"", encoding=encoding) ``` **Environment**:
Output of xr.show_versions() ``` # I'll update with the the full output of xr.show_versions() soon. In [4]: xr.__version__ Out[4]: '0.16.2' In [2]: zarr.__version__ Out[2]: '2.6.1' ```
","{""url"": ""https://api.github.com/repos/pydata/xarray/issues/4826/reactions"", ""total_count"": 1, ""+1"": 1, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,completed,13221727,issue 484098286,MDU6SXNzdWU0ODQwOTgyODY=,3242,"An `asfreq` method without `resample`, and clarify or improve resample().asfreq() behavior for down-sampling",463809,open,0,,,2,2019-08-22T16:33:32Z,2022-04-18T16:01:07Z,,CONTRIBUTOR,,,,"#### MCVE Code Sample ```python # Your code here >>> import numpy as np >>> import xarray as xr >>> import pandas as pd >>> data = np.random.random(300) # Make a time grid that doesn't start exactly on the hour. >>> time = pd.date_range('2019-01-01', periods=300, freq='T') + pd.Timedelta('3T') >>> time DatetimeIndex(['2019-01-01 00:03:00', '2019-01-01 00:04:00', '2019-01-01 00:05:00', '2019-01-01 00:06:00', '2019-01-01 00:07:00', '2019-01-01 00:08:00', '2019-01-01 00:09:00', '2019-01-01 00:10:00', '2019-01-01 00:11:00', '2019-01-01 00:12:00', ... '2019-01-01 04:53:00', '2019-01-01 04:54:00', '2019-01-01 04:55:00', '2019-01-01 04:56:00', '2019-01-01 04:57:00', '2019-01-01 04:58:00', '2019-01-01 04:59:00', '2019-01-01 05:00:00', '2019-01-01 05:01:00', '2019-01-01 05:02:00'], dtype='datetime64[ns]', length=300, freq='T') >>> da = xr.DataArray(data, dims=['time'], coords={'time': time}) >>> resampled = da.resample(time='H').asfreq() >>> resampled array([0.478601, 0.488425, 0.496322, 0.479256, 0.523395, 0.201718]) Coordinates: * time (time) datetime64[ns] 2019-01-01 ... 2019-01-01T05:00:00 # The value is actually the mean over the time window, eg. the third value is: >>> da.loc['2019-01-01T02:00:00':'2019-01-01T02:59:00'].mean() array(0.496322) ``` #### Expected Output Docs say this: ``` Return values of original object at the new up-sampling frequency; essentially a re-index with new times set to NaN. ``` I suppose this doc is not technically wrong, since upon careful reading, I realize it does not define a behavior for down-sampling. But it's easy to: (1) assume the same behavior (reindexing) for down-sampling and up-sampling and/or (2) expect behavior similar to `df.asfreq()` in pandas. #### Problem Description I would argue for an `asfreq` method without resampling that matches the pandas behavior, which AFAIK, is to reindex starting at the first timestamp, at the specified interval. ``` >>> df = pd.DataFrame(da, index=time) >>> df.asfreq('H') 0 2019-01-01 00:03:00 0.065304 2019-01-01 01:03:00 0.325814 2019-01-01 02:03:00 0.841201 2019-01-01 03:03:00 0.610266 2019-01-01 04:03:00 0.613906 ``` This can currently easily be achieved, so it's not a blocker. ``` >>> da.reindex(time=pd.date_range(da.time[0].values, da.time[-1].values, freq='H')) array([0.065304, 0.325814, 0.841201, 0.610266, 0.613906]) Coordinates: * time (time) datetime64[ns] 2019-01-01T00:03:00 ... 2019-01-01T04:03:00 ``` Why I argue for `asfreq` functionality outside of resampling is that `asfreq(freq)` in pandas is purely a reindex, compared to eg `resample(freq).first()` which would give you a different time index. #### Output of ``xr.show_versions()`` Still on python27, `show_versions` actually throws an exception, because some HDF5 library doesn't have a magic property. I don't think this detail is relevant here though.
``` >>> xr.__version__ u'0.11.3' ```
","{""url"": ""https://api.github.com/repos/pydata/xarray/issues/3242/reactions"", ""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,,13221727,issue 520507183,MDExOlB1bGxSZXF1ZXN0MzM5MDg0MjUz,3504,Allow appending datetime & boolean variables to zarr stores,463809,closed,0,,,5,2019-11-09T20:09:29Z,2019-11-13T18:47:42Z,2019-11-13T15:55:33Z,CONTRIBUTOR,,0,pydata/xarray/pulls/3504," - [x] Closes #3480 - [x] Tests added - [x] Passes `black . && mypy . && flake8` - [x] Fully documented, including `whats-new.rst` for all changes and `api.rst` for new API AFAICT, the type checking in the `_validate_datatypes_for_zarr_append` is simply too strict, and relaxing it seems to work fine. But this is my first time digging into the xarray source code, so please let me know if this issue is more complex.","{""url"": ""https://api.github.com/repos/pydata/xarray/issues/3504/reactions"", ""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,,13221727,pull 516725099,MDU6SXNzdWU1MTY3MjUwOTk=,3480,Allow appending non-numerical types to zarr arrays.,463809,closed,0,,,0,2019-11-02T21:20:53Z,2019-11-13T15:55:33Z,2019-11-13T15:55:33Z,CONTRIBUTOR,,,,"#### MCVE Code Sample Zarr itself allows appending `np.datetime` and `np.bool` types. ```python >>> path = 'tmp/test.zarr' >>> z1 = zarr.open(path, mode='w', shape=(10,), chunks=(10,), dtype='M8[D]') >>> z1[:] = '1990-01-01' >>> z2 = zarr.open(path, mode='a') >>> a = np.array(['1992-01-01'] * 10, dtype='datetime64[D]') >>> z2.append(a) (20,) >>> z2 ``` But it's equivalent in xarray throws an error: ``` >>> ds = xr.Dataset( ... {'y': (('x',), np.array(['1991-01-01'] * 10, dtype='datetime64[D]'))} ... ) >>> ds.to_zarr('tmp/test_xr.zarr', mode='w') >>> ds2 = xr.Dataset( ... {'y': (('x',), np.array(['1992-01-01'] * 10, dtype='datetime64[D]'))} ... ) >>> ds2.to_zarr('tmp/test_xr.zarr', mode='a', append_dim='x') Traceback (most recent call last): File """", line 1, in File ""/Users/personal/opt/anaconda3/lib/python3.7/site-packages/xarray/core/dataset.py"", line 1616, in to_zarr append_dim=append_dim, File ""/Users/personal/opt/anaconda3/lib/python3.7/site-packages/xarray/backends/api.py"", line 1304, in to_zarr _validate_datatypes_for_zarr_append(dataset) File ""/Users/personal/opt/anaconda3/lib/python3.7/site-packages/xarray/backends/api.py"", line 1249, in _validate_datatypes_for_zarr_append check_dtype(k) File ""/Users/personal/opt/anaconda3/lib/python3.7/site-packages/xarray/backends/api.py"", line 1245, in check_dtype ""unicode string or an object"".format(var) ValueError: Invalid dtype for data variable: array(['1992-01-01T00:00:00.000000000', '1992-01-01T00:00:00.000000000', '1992-01-01T00:00:00.000000000', '1992-01-01T00:00:00.000000000', '1992-01-01T00:00:00.000000000', '1992-01-01T00:00:00.000000000', '1992-01-01T00:00:00.000000000', '1992-01-01T00:00:00.000000000', '1992-01-01T00:00:00.000000000', '1992-01-01T00:00:00.000000000'], dtype='datetime64[ns]') Dimensions without coordinates: x dtype must be a subtype of number, a fixed sized string, a fixed size unicode string or an object ``` #### Expected Output The append should succeed. #### Problem Description This function in `xarray/api.py` is too strict on types: ``` def _validate_datatypes_for_zarr_append(dataset): """"""DataArray.name and Dataset keys must be a string or None"""""" def check_dtype(var): if ( not np.issubdtype(var.dtype, np.number) and not coding.strings.is_unicode_dtype(var.dtype) and not var.dtype == object ): # and not re.match('^bytes[1-9]+$', var.dtype.name)): raise ValueError( ""Invalid dtype for data variable: {} "" ""dtype must be a subtype of number, "" ""a fixed sized string, a fixed size "" ""unicode string or an object"".format(var) ) for k in dataset.data_vars.values(): check_dtype(k) ``` `np.datetime64[.]` and `np.bool` are not numbers: ``` >>> np.issubdtype(np.dtype('datetime64[D]'), np.number) False >>> np.issubdtype(np.dtype('bool'), np.number) False ``` #### Output of ``xr.show_versions()``
INSTALLED VERSIONS ------------------ commit: None python: 3.7.4 (default, Aug 13 2019, 15:17:50) [Clang 4.0.1 (tags/RELEASE_401/final)] python-bits: 64 OS: Darwin OS-release: 18.6.0 machine: x86_64 processor: i386 byteorder: little LC_ALL: None LANG: en_US.UTF-8 LOCALE: en_US.UTF-8 libhdf5: 1.10.4 libnetcdf: None xarray: 0.14.0 pandas: 0.25.1 numpy: 1.17.2 scipy: 1.3.1 netCDF4: None pydap: None h5netcdf: None h5py: 2.9.0 Nio: None zarr: 2.3.2 cftime: None nc_time_axis: None PseudoNetCDF: None rasterio: None cfgrib: None iris: None bottleneck: 1.2.1 dask: 2.5.2 distributed: 2.5.2 matplotlib: 3.1.1 cartopy: None seaborn: 0.9.0 numbagg: None setuptools: 41.4.0 pip: 19.2.3 conda: 4.7.12 pytest: 5.2.1 IPython: 7.8.0 sphinx: 2.2.0
","{""url"": ""https://api.github.com/repos/pydata/xarray/issues/3480/reactions"", ""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,completed,13221727,issue