id,node_id,number,title,user,state,locked,assignee,milestone,comments,created_at,updated_at,closed_at,author_association,active_lock_reason,draft,pull_request,body,reactions,performed_via_github_app,state_reason,repo,type
789410367,MDU6SXNzdWU3ODk0MTAzNjc=,4826,Reading and writing a zarr dataset multiple times casts bools to int8,463809,closed,0,,,10,2021-01-19T22:02:15Z,2023-04-10T09:26:27Z,2023-04-10T09:26:27Z,CONTRIBUTOR,,,,"**What happened**:
Reading and writing zarr dataset multiple times into different paths changes `bool` dtype arrays to `int8`. I think this issue is related to #2937.
**What you expected to happen**:
My array's dtype in numpy/dask should not change, even if certain storage backends store dtypes a certain way.
**Minimal Complete Verifiable Example**:
```python
import xarray as xr
import numpy as np
ds = xr.Dataset({
""bool_field"": xr.DataArray(
np.random.randn(5) < 0.5,
dims=('g'),
coords={'g': np.arange(5)}
)
})
ds.to_zarr('test.zarr', mode=""w"")
d2 = xr.open_zarr('test.zarr')
print(d2.bool_field.dtype)
print(d2.bool_field.encoding)
d2.to_zarr(""test2.zarr"", mode=""w"")
d3 = xr.open_zarr('test2.zarr')
print(d3.bool_field.dtype)
```
The above snippet prints the following. In d3, the dtype of `bool_field` is `int8`, presumably because d3 inherited d2's `encoding` and it says `int8`, despite the array having a `bool` dtype.
```
bool
{'chunks': (5,), 'compressor': Blosc(cname='lz4', clevel=5, shuffle=SHUFFLE, blocksize=0), 'filters': None, 'dtype': dtype('int8')}
int8
```
**Anything else we need to know?**:
Currently workaround is to explicitly set encodings. This fixes the problem:
```python
encoding = {k: {""dtype"": d2[k].dtype} for k in d2}
d2.to_zarr('test2.zarr', mode=""w"", encoding=encoding)
```
**Environment**:
Output of xr.show_versions()
```
# I'll update with the the full output of xr.show_versions() soon.
In [4]: xr.__version__
Out[4]: '0.16.2'
In [2]: zarr.__version__
Out[2]: '2.6.1'
```
","{""url"": ""https://api.github.com/repos/pydata/xarray/issues/4826/reactions"", ""total_count"": 1, ""+1"": 1, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,completed,13221727,issue
484098286,MDU6SXNzdWU0ODQwOTgyODY=,3242,"An `asfreq` method without `resample`, and clarify or improve resample().asfreq() behavior for down-sampling",463809,open,0,,,2,2019-08-22T16:33:32Z,2022-04-18T16:01:07Z,,CONTRIBUTOR,,,,"#### MCVE Code Sample
```python
# Your code here
>>> import numpy as np
>>> import xarray as xr
>>> import pandas as pd
>>> data = np.random.random(300)
# Make a time grid that doesn't start exactly on the hour.
>>> time = pd.date_range('2019-01-01', periods=300, freq='T') + pd.Timedelta('3T')
>>> time
DatetimeIndex(['2019-01-01 00:03:00', '2019-01-01 00:04:00',
'2019-01-01 00:05:00', '2019-01-01 00:06:00',
'2019-01-01 00:07:00', '2019-01-01 00:08:00',
'2019-01-01 00:09:00', '2019-01-01 00:10:00',
'2019-01-01 00:11:00', '2019-01-01 00:12:00',
...
'2019-01-01 04:53:00', '2019-01-01 04:54:00',
'2019-01-01 04:55:00', '2019-01-01 04:56:00',
'2019-01-01 04:57:00', '2019-01-01 04:58:00',
'2019-01-01 04:59:00', '2019-01-01 05:00:00',
'2019-01-01 05:01:00', '2019-01-01 05:02:00'],
dtype='datetime64[ns]', length=300, freq='T')
>>> da = xr.DataArray(data, dims=['time'], coords={'time': time})
>>> resampled = da.resample(time='H').asfreq()
>>> resampled
array([0.478601, 0.488425, 0.496322, 0.479256, 0.523395, 0.201718])
Coordinates:
* time (time) datetime64[ns] 2019-01-01 ... 2019-01-01T05:00:00
# The value is actually the mean over the time window, eg. the third value is:
>>> da.loc['2019-01-01T02:00:00':'2019-01-01T02:59:00'].mean()
array(0.496322)
```
#### Expected Output
Docs say this:
```
Return values of original object at the new up-sampling frequency;
essentially a re-index with new times set to NaN.
```
I suppose this doc is not technically wrong, since upon careful reading, I realize it does not define a behavior for down-sampling. But it's easy to: (1) assume the same behavior (reindexing) for down-sampling and up-sampling and/or (2) expect behavior similar to `df.asfreq()` in pandas.
#### Problem Description
I would argue for an `asfreq` method without resampling that matches the pandas behavior, which AFAIK, is to reindex starting at the first timestamp, at the specified interval.
```
>>> df = pd.DataFrame(da, index=time)
>>> df.asfreq('H')
0
2019-01-01 00:03:00 0.065304
2019-01-01 01:03:00 0.325814
2019-01-01 02:03:00 0.841201
2019-01-01 03:03:00 0.610266
2019-01-01 04:03:00 0.613906
```
This can currently easily be achieved, so it's not a blocker.
```
>>> da.reindex(time=pd.date_range(da.time[0].values, da.time[-1].values, freq='H'))
array([0.065304, 0.325814, 0.841201, 0.610266, 0.613906])
Coordinates:
* time (time) datetime64[ns] 2019-01-01T00:03:00 ... 2019-01-01T04:03:00
```
Why I argue for `asfreq` functionality outside of resampling is that `asfreq(freq)` in pandas is purely a reindex, compared to eg `resample(freq).first()` which would give you a different time index.
#### Output of ``xr.show_versions()``
Still on python27, `show_versions` actually throws an exception, because some HDF5 library doesn't have a magic property. I don't think this detail is relevant here though.
```
>>> xr.__version__
u'0.11.3'
```
","{""url"": ""https://api.github.com/repos/pydata/xarray/issues/3242/reactions"", ""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,,13221727,issue
520507183,MDExOlB1bGxSZXF1ZXN0MzM5MDg0MjUz,3504,Allow appending datetime & boolean variables to zarr stores,463809,closed,0,,,5,2019-11-09T20:09:29Z,2019-11-13T18:47:42Z,2019-11-13T15:55:33Z,CONTRIBUTOR,,0,pydata/xarray/pulls/3504,"
- [x] Closes #3480
- [x] Tests added
- [x] Passes `black . && mypy . && flake8`
- [x] Fully documented, including `whats-new.rst` for all changes and `api.rst` for new API
AFAICT, the type checking in the `_validate_datatypes_for_zarr_append` is simply too strict, and relaxing it seems to work fine. But this is my first time digging into the xarray source code, so please let me know if this issue is more complex.","{""url"": ""https://api.github.com/repos/pydata/xarray/issues/3504/reactions"", ""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,,13221727,pull
516725099,MDU6SXNzdWU1MTY3MjUwOTk=,3480,Allow appending non-numerical types to zarr arrays.,463809,closed,0,,,0,2019-11-02T21:20:53Z,2019-11-13T15:55:33Z,2019-11-13T15:55:33Z,CONTRIBUTOR,,,,"#### MCVE Code Sample
Zarr itself allows appending `np.datetime` and `np.bool` types.
```python
>>> path = 'tmp/test.zarr'
>>> z1 = zarr.open(path, mode='w', shape=(10,), chunks=(10,), dtype='M8[D]')
>>> z1[:] = '1990-01-01'
>>> z2 = zarr.open(path, mode='a')
>>> a = np.array(['1992-01-01'] * 10, dtype='datetime64[D]')
>>> z2.append(a)
(20,)
>>> z2
```
But it's equivalent in xarray throws an error:
```
>>> ds = xr.Dataset(
... {'y': (('x',), np.array(['1991-01-01'] * 10, dtype='datetime64[D]'))}
... )
>>> ds.to_zarr('tmp/test_xr.zarr', mode='w')
>>> ds2 = xr.Dataset(
... {'y': (('x',), np.array(['1992-01-01'] * 10, dtype='datetime64[D]'))}
... )
>>> ds2.to_zarr('tmp/test_xr.zarr', mode='a', append_dim='x')
Traceback (most recent call last):
File """", line 1, in
File ""/Users/personal/opt/anaconda3/lib/python3.7/site-packages/xarray/core/dataset.py"", line 1616, in to_zarr
append_dim=append_dim,
File ""/Users/personal/opt/anaconda3/lib/python3.7/site-packages/xarray/backends/api.py"", line 1304, in to_zarr
_validate_datatypes_for_zarr_append(dataset)
File ""/Users/personal/opt/anaconda3/lib/python3.7/site-packages/xarray/backends/api.py"", line 1249, in _validate_datatypes_for_zarr_append
check_dtype(k)
File ""/Users/personal/opt/anaconda3/lib/python3.7/site-packages/xarray/backends/api.py"", line 1245, in check_dtype
""unicode string or an object"".format(var)
ValueError: Invalid dtype for data variable:
array(['1992-01-01T00:00:00.000000000', '1992-01-01T00:00:00.000000000',
'1992-01-01T00:00:00.000000000', '1992-01-01T00:00:00.000000000',
'1992-01-01T00:00:00.000000000', '1992-01-01T00:00:00.000000000',
'1992-01-01T00:00:00.000000000', '1992-01-01T00:00:00.000000000',
'1992-01-01T00:00:00.000000000', '1992-01-01T00:00:00.000000000'],
dtype='datetime64[ns]')
Dimensions without coordinates: x dtype must be a subtype of number, a fixed sized string, a fixed size unicode string or an object
```
#### Expected Output
The append should succeed.
#### Problem Description
This function in `xarray/api.py` is too strict on types:
```
def _validate_datatypes_for_zarr_append(dataset):
""""""DataArray.name and Dataset keys must be a string or None""""""
def check_dtype(var):
if (
not np.issubdtype(var.dtype, np.number)
and not coding.strings.is_unicode_dtype(var.dtype)
and not var.dtype == object
):
# and not re.match('^bytes[1-9]+$', var.dtype.name)):
raise ValueError(
""Invalid dtype for data variable: {} ""
""dtype must be a subtype of number, ""
""a fixed sized string, a fixed size ""
""unicode string or an object"".format(var)
)
for k in dataset.data_vars.values():
check_dtype(k)
```
`np.datetime64[.]` and `np.bool` are not numbers:
```
>>> np.issubdtype(np.dtype('datetime64[D]'), np.number)
False
>>> np.issubdtype(np.dtype('bool'), np.number)
False
```
#### Output of ``xr.show_versions()``
INSTALLED VERSIONS
------------------
commit: None
python: 3.7.4 (default, Aug 13 2019, 15:17:50)
[Clang 4.0.1 (tags/RELEASE_401/final)]
python-bits: 64
OS: Darwin
OS-release: 18.6.0
machine: x86_64
processor: i386
byteorder: little
LC_ALL: None
LANG: en_US.UTF-8
LOCALE: en_US.UTF-8
libhdf5: 1.10.4
libnetcdf: None
xarray: 0.14.0
pandas: 0.25.1
numpy: 1.17.2
scipy: 1.3.1
netCDF4: None
pydap: None
h5netcdf: None
h5py: 2.9.0
Nio: None
zarr: 2.3.2
cftime: None
nc_time_axis: None
PseudoNetCDF: None
rasterio: None
cfgrib: None
iris: None
bottleneck: 1.2.1
dask: 2.5.2
distributed: 2.5.2
matplotlib: 3.1.1
cartopy: None
seaborn: 0.9.0
numbagg: None
setuptools: 41.4.0
pip: 19.2.3
conda: 4.7.12
pytest: 5.2.1
IPython: 7.8.0
sphinx: 2.2.0
","{""url"": ""https://api.github.com/repos/pydata/xarray/issues/3480/reactions"", ""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,completed,13221727,issue