id,node_id,number,title,user,state,locked,assignee,milestone,comments,created_at,updated_at,closed_at,author_association,active_lock_reason,draft,pull_request,body,reactions,performed_via_github_app,state_reason,repo,type
894125618,MDU6SXNzdWU4OTQxMjU2MTg=,5329,"xarray 0.18.0 raises ValueError, not FileNotFoundError, when opening a non-existent file",9010180,open,0,,,8,2021-05-18T08:35:20Z,2022-09-21T18:19:57Z,,NONE,,,,"
**What happened**:
In a Python environment with xarray 0.18.0 and python-netcdf4 installed, I called `xarray.open_dataset(""nonexistent"")`. (The file ""nonexistent"" does not exist.) xarray threw a `ValueError: cannot guess the engine, try passing one explicitly`.
**What you expected to happen**:
I expected a `FileNotFoundError` error to be thrown, as in xarray 0.17.0.
**Minimal Complete Verifiable Example**:
```python
import xarray as xr
xr.open_dataset(""nonexistent"")
```
**Anything else we need to know?**:
This is presumably related to Issue #5295, but is not fixed by PR #5296: ValueError is also thrown with the currently latest commit in master (9165c266).
This change in behaviour produced a hard-to-diagnose bug deep in xcube, where we were catching the FileNotFound exception to deal gracefully with a missing file, but the new ValueError was of course not caught -- and the error message did not make the cause obvious. Catching ValueError is a workaround, but not a great solution since it may also be thrown for files which do exist but don't have a recognizable data format. I suspect that other codebases may be similarly affected.
xarray 0.17.0 *was* capable of throwing a ValueError for a non-existent file, but only in the (rare?) case that neither netCDF4-python nor scipy was installed.
**Environment**:
Output of xr.show_versions()
INSTALLED VERSIONS
------------------
commit: None
python: 3.9.4 | packaged by conda-forge | (default, May 10 2021, 22:13:33)
[GCC 9.3.0]
python-bits: 64
OS: Linux
OS-release: 5.8.0-53-generic
machine: x86_64
processor: x86_64
byteorder: little
LC_ALL: None
LANG: en_GB.UTF-8
LOCALE: en_GB.UTF-8
libhdf5: 1.10.6
libnetcdf: 4.8.0
xarray: 0.18.0
pandas: 1.2.4
numpy: 1.20.2
scipy: None
netCDF4: 1.5.6
pydap: None
h5netcdf: None
h5py: None
Nio: None
zarr: None
cftime: 1.4.1
nc_time_axis: None
PseudoNetCDF: None
rasterio: None
cfgrib: None
iris: None
bottleneck: None
dask: None
distributed: None
matplotlib: None
cartopy: None
seaborn: None
numbagg: None
pint: None
setuptools: 49.6.0.post20210108
pip: 21.1.1
conda: None
pytest: None
IPython: None
sphinx: None
","{""url"": ""https://api.github.com/repos/pydata/xarray/issues/5329/reactions"", ""total_count"": 1, ""+1"": 1, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,,13221727,issue
894497993,MDU6SXNzdWU4OTQ0OTc5OTM=,5331,AttributeError using map_blocks with dask 2021.05.0,9010180,closed,0,,,3,2021-05-18T15:18:53Z,2021-05-19T08:01:07Z,2021-05-19T08:01:07Z,NONE,,,,"
**What happened**:
In an environment with xarray 0.18.0 and dask 2021.05.0 installed, I saved a dataset using `to_zarr`, opened it again using `open_zarr`, and called `map_blocks` on one of its variables. I got the following traceback:
```
Traceback (most recent call last):
File ""/home/pont/./dasktest2.py"", line 12, in
ds2.myvar.map_blocks(lambda block: block)
File ""/home/pont/loc/envs/xcube-repos/lib/python3.9/site-packages/xarray/core/dataarray.py"", line 3770, in map_blocks
return map_blocks(func, self, args, kwargs, template)
File ""/home/pont/loc/envs/xcube-repos/lib/python3.9/site-packages/xarray/core/parallel.py"", line 565, in map_blocks
data = dask.array.Array(
File ""/home/pont/loc/envs/xcube-repos/lib/python3.9/site-packages/dask/array/core.py"", line 1159, in __new__
if layer.collection_annotations is None:
AttributeError: 'dict' object has no attribute 'collection_annotations'
```
**What you expected to happen**:
I expected `map_blocks` to complete successfully.
**Minimal Complete Verifiable Example**:
```python
import xarray as xr
import numpy as np
ds1 = xr.Dataset({
""myvar"": ((""x""), np.zeros(10)),
""x"": (""x"", np.arange(10)),
})
ds1.to_zarr(""test.zarr"", mode=""w"")
ds2 = xr.open_zarr(""test.zarr"")
ds2.myvar.map_blocks(lambda block: block)
```
**Anything else we need to know?**:
I wasn't sure whether to report this issue with dask or xcube. With dask 2021.04.1 the example runs without error, and it seems that [dask PR 7309](https://github.com/dask/dask/pull/7309) introduced the breaking change. But my understanding of xarray's `map_blocks` implementation isn't sufficient to figure out where exactly the bug lies.
**Environment**:
Output of xr.show_versions()
INSTALLED VERSIONS
------------------
commit: None
python: 3.9.4 | packaged by conda-forge | (default, May 10 2021, 22:13:33)
[GCC 9.3.0]
python-bits: 64
OS: Linux
OS-release: 5.8.0-53-generic
machine: x86_64
processor: x86_64
byteorder: little
LC_ALL: None
LANG: en_GB.UTF-8
LOCALE: en_GB.UTF-8
libhdf5: None
libnetcdf: None
xarray: 0.18.0
pandas: 1.2.4
numpy: 1.20.2
scipy: None
netCDF4: None
pydap: None
h5netcdf: None
h5py: None
Nio: None
zarr: 2.8.1
cftime: None
nc_time_axis: None
PseudoNetCDF: None
rasterio: None
cfgrib: None
iris: None
bottleneck: None
dask: 2021.05.0
distributed: 2021.05.0
matplotlib: None
cartopy: None
seaborn: None
numbagg: None
pint: None
setuptools: 49.6.0.post20210108
pip: 21.1.1
conda: None
pytest: None
IPython: None
sphinx: None
","{""url"": ""https://api.github.com/repos/pydata/xarray/issues/5331/reactions"", ""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,completed,13221727,issue
868976909,MDU6SXNzdWU4Njg5NzY5MDk=,5224,xarray can't append to Zarrs with byte-string variables,9010180,open,0,,,3,2021-04-27T15:30:44Z,2021-04-28T17:15:43Z,,NONE,,,,"
**What happened**:
I tried to use xarray to append a Dataset to a Zarr containing a `|S1` ([char string](https://numpy.org/doc/stable/reference/arrays.dtypes.html#specifying-and-constructing-data-types)) datatype, and received this error:
> ValueError: Invalid dtype for data variable: array(b'', dtype='|S1') dtype must be a subtype of number, datetime, bool, a fixed sized string, a fixed size unicode string or an object
**What you expected to happen**:
I expected the Dataset to be appended to the Zarr.
**Minimal Complete Verifiable Example**:
Note: this is not quite ""minimal"", since it also performs the append using the Zarr library directly and using a `|U1` (Unicode) datatype in order to demonstrate that these variations work.
```python
import numpy as np
import xarray as xr
import zarr
def test_append(data_type, zarr_path):
print(f""Creating {data_type} Zarr..."")
ds = xr.Dataset({""x"": np.array("""", dtype=data_type)})
ds.to_zarr(zarr_path, mode=""w"")
print(f""Appending to {data_type} Zarr with Zarr library..."")
zarr_to_append = zarr.open(zarr_path, mode=""a"")
zarr_to_append.x.append(np.array("""", dtype=data_type))
print(f""Appending to {data_type} Zarr with xarray..."")
ds_to_append = xr.Dataset({""x"": np.array("""", dtype=data_type)})
ds_to_append.to_zarr(zarr_path, mode=""a"")
test_append(""|U1"", ""test-u.zarr"")
test_append(""|S1"", ""test-s.zarr"")
```
**Anything else we need to know?**:
I came across this problem when converting some NetCDFs from [this dataset](https://land.copernicus.eu/global/products/lwq) to a Zarr, appending them along the time axis. The latest data format vesion (1.4) includes a dimensionless variable `crs` with type `char`, which xarray reads as an `|S1`, causing the error described above when I attempt to append. Replacing `crs` with a `|U1`-typed variable works around the problem, but is undesirable since we need to reproduce the NetCDFs as closely as possible. The example above shows that the Zarr format and library themselves don't seem to have a problem with appending byte string variables.
The obvious fix would be to loosen the type check in `xarray.backends.api._validate_datatypes_for_zarr_append`:
```python
if (
not np.issubdtype(var.dtype, np.number)
and not np.issubdtype(var.dtype, np.datetime64)
and not np.issubdtype(var.dtype, np.bool_)
and not coding.strings.is_unicode_dtype(var.dtype)
and not coding.strings.is_bytes_dtype(var.dtype) # <- this line added to avoid ""Invalid dtype"" error
and not var.dtype == object
):
```
This change makes the example above work, but I don't know if it would result in any unintended side-effects.
**Environment**:
Output of xr.show_versions()
INSTALLED VERSIONS
------------------
commit: 0021cdab91f7466f4be0fb32dae92bf3f8290e19
python: 3.8.5 (default, Jan 27 2021, 15:41:15)
[GCC 9.3.0]
python-bits: 64
OS: Linux
OS-release: 5.8.0-50-generic
machine: x86_64
processor: x86_64
byteorder: little
LC_ALL: None
LANG: en_GB.UTF-8
LOCALE: en_GB.UTF-8
libhdf5: 1.10.4
libnetcdf: 4.7.3
xarray: 0.15.0
pandas: 0.25.3
numpy: 1.17.4
scipy: 1.3.3
netCDF4: 1.5.3
pydap: None
h5netcdf: 0.7.1
h5py: 2.10.0
Nio: None
zarr: 2.4.0+ds
cftime: 1.1.0
nc_time_axis: None
PseudoNetCDF: None
rasterio: 1.1.3
cfgrib: None
iris: None
bottleneck: 1.2.1
dask: 2.8.1+dfsg
distributed: None
matplotlib: 3.1.2
cartopy: None
seaborn: None
numbagg: None
pint: None
setuptools: 45.2.0
pip: 20.0.2
conda: None
pytest: 4.6.9
IPython: 7.13.0
sphinx: 1.8.5
","{""url"": ""https://api.github.com/repos/pydata/xarray/issues/5224/reactions"", ""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,,13221727,issue
844712857,MDU6SXNzdWU4NDQ3MTI4NTc=,5093,"open_dataset uses cftime, not datetime64, when calendar attribute is ""Gregorian""",9010180,closed,0,,,2,2021-03-30T15:12:09Z,2021-04-20T14:17:42Z,2021-04-18T10:17:08Z,NONE,,,,"
**What happened**:
I used `xarray.open_dataset` to open a NetCDF file whose `time` coordinate had the `calendar` attribute set to `Gregorian`. All dates were within the Timestamp-valid range.
The resulting dataset represented the `time` co-ordinate as a
`cftime._cftime.DatetimeGregorian`.
**What you expected to happen**:
I expected the dataset to represent the `time` co-ordinate as a `datetime64[ns]`, as documented [here](http://xarray.pydata.org/en/stable/generated/xarray.open_dataset.html) and [here](http://xarray.pydata.org/en/stable/weather-climate.html#non-standard-calendars-and-dates-outside-the-timestamp-valid-range).
**Minimal Complete Verifiable Example**:
```python
import xarray as xr
import numpy as np
import pandas as pd
def print_time_type(dataset):
print(dataset.time.dtype, type(dataset.time[0].item()))
da = xr.DataArray(
data=[32, 16, 8],
dims=[""time""],
coords=dict(
time=pd.date_range(""2014-09-06"", periods=3),
reference_time=pd.Timestamp(""2014-09-05""),
),
)
# Create dataset and confirm type of time
ds1 = xr.Dataset({""myvar"": da})
print_time_type(ds1) # prints ""datetime64[ns]""
# Manually set time attributes to ""Gregorian"" rather
# than default ""proleptic_gregorian"".
ds1.time.encoding[""calendar""] = ""Gregorian""
ds1.reference_time.encoding[""calendar""] = ""Gregorian""
ds1.to_netcdf(""test-capitalized.nc"")
ds2 = xr.open_dataset(""test-capitalized.nc"")
print_time_type(ds2)
# prints ""object ""
# Workaround: add ""Gregorian"" to list of standard calendars.
xr.coding.times._STANDARD_CALENDARS.add(""Gregorian"")
ds3 = xr.open_dataset(""test-capitalized.nc"")
print_time_type(ds3) # prints ""datetime64[ns]""
```
**Anything else we need to know?**:
The [documentation for the `use_cftime` parameter of `open_dataset`](http://xarray.pydata.org/en/stable/generated/xarray.open_dataset.html) says:
> If None (default), attempt to decode times to `np.datetime64[ns]` objects; if this is not possible, decode times to `cftime.datetime` objects.
In practice, we are getting some `cftime.datetime`s even for times which are interpretable and representable as `np.datetime64[ns]`s. In particular, we have some NetCDF files in which the `time` variable has a `calendar` attribute with a value of `Gregorian` (with a capital ‘G’). CF conventions [allow this](http://cfconventions.org/Data/cf-conventions/cf-conventions-1.8/cf-conventions.html#_attributes):
> When this standard defines string attributes that may take various prescribed values, the possible values are generally given in lower case. However, applications programs should not be sensitive to case in these attributes.
However, xarray regards `Gregorian` as a non-standard calendar and falls back to `cftime.datetime`. If (as in the example) `Gregorian` is added to `xr.coding.times._STANDARD_CALENDARS`, the times are read as `np.datetime64[ns]`s.
Suggested fix: in [`xarray.coding.times._decode_datetime_with_pandas`](https://github.com/pydata/xarray/blob/45b4436bd5a82e7020357cf681b13067a8dd59e9/xarray/coding/times.py#L169), change ‘`if calendar not in _STANDARD_CALENDARS:`’ to ‘`if calendar.lower() not in _STANDARD_CALENDARS:`’.
**Environment**:
Output of xr.show_versions()
INSTALLED VERSIONS
------------------
commit: None
python: 3.9.2 | packaged by conda-forge | (default, Feb 21 2021, 05:02:46)
[GCC 9.3.0]
python-bits: 64
OS: Linux
OS-release: 5.8.0-48-generic
machine: x86_64
processor: x86_64
byteorder: little
LC_ALL: None
LANG: en_GB.UTF-8
LOCALE: en_GB.UTF-8
libhdf5: 1.10.6
libnetcdf: 4.7.4
xarray: 0.17.1.dev39+g45b4436b
pandas: 1.2.3
numpy: 1.20.2
scipy: None
netCDF4: 1.5.6
pydap: None
h5netcdf: None
h5py: None
Nio: None
zarr: None
cftime: 1.4.1
nc_time_axis: None
PseudoNetCDF: None
rasterio: None
cfgrib: None
iris: None
bottleneck: None
dask: None
distributed: None
matplotlib: None
cartopy: None
seaborn: None
numbagg: None
pint: None
setuptools: 49.6.0.post20210108
pip: 21.0.1
conda: None
pytest: None
IPython: None
sphinx: None
","{""url"": ""https://api.github.com/repos/pydata/xarray/issues/5093/reactions"", ""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,completed,13221727,issue