id,node_id,number,title,user,state,locked,assignee,milestone,comments,created_at,updated_at,closed_at,author_association,active_lock_reason,draft,pull_request,body,reactions,performed_via_github_app,state_reason,repo,type 894125618,MDU6SXNzdWU4OTQxMjU2MTg=,5329,"xarray 0.18.0 raises ValueError, not FileNotFoundError, when opening a non-existent file",9010180,open,0,,,8,2021-05-18T08:35:20Z,2022-09-21T18:19:57Z,,NONE,,,," **What happened**: In a Python environment with xarray 0.18.0 and python-netcdf4 installed, I called `xarray.open_dataset(""nonexistent"")`. (The file ""nonexistent"" does not exist.) xarray threw a `ValueError: cannot guess the engine, try passing one explicitly`. **What you expected to happen**: I expected a `FileNotFoundError` error to be thrown, as in xarray 0.17.0. **Minimal Complete Verifiable Example**: ```python import xarray as xr xr.open_dataset(""nonexistent"") ``` **Anything else we need to know?**: This is presumably related to Issue #5295, but is not fixed by PR #5296: ValueError is also thrown with the currently latest commit in master (9165c266). This change in behaviour produced a hard-to-diagnose bug deep in xcube, where we were catching the FileNotFound exception to deal gracefully with a missing file, but the new ValueError was of course not caught -- and the error message did not make the cause obvious. Catching ValueError is a workaround, but not a great solution since it may also be thrown for files which do exist but don't have a recognizable data format. I suspect that other codebases may be similarly affected. xarray 0.17.0 *was* capable of throwing a ValueError for a non-existent file, but only in the (rare?) case that neither netCDF4-python nor scipy was installed. **Environment**:
Output of xr.show_versions() INSTALLED VERSIONS ------------------ commit: None python: 3.9.4 | packaged by conda-forge | (default, May 10 2021, 22:13:33) [GCC 9.3.0] python-bits: 64 OS: Linux OS-release: 5.8.0-53-generic machine: x86_64 processor: x86_64 byteorder: little LC_ALL: None LANG: en_GB.UTF-8 LOCALE: en_GB.UTF-8 libhdf5: 1.10.6 libnetcdf: 4.8.0 xarray: 0.18.0 pandas: 1.2.4 numpy: 1.20.2 scipy: None netCDF4: 1.5.6 pydap: None h5netcdf: None h5py: None Nio: None zarr: None cftime: 1.4.1 nc_time_axis: None PseudoNetCDF: None rasterio: None cfgrib: None iris: None bottleneck: None dask: None distributed: None matplotlib: None cartopy: None seaborn: None numbagg: None pint: None setuptools: 49.6.0.post20210108 pip: 21.1.1 conda: None pytest: None IPython: None sphinx: None
","{""url"": ""https://api.github.com/repos/pydata/xarray/issues/5329/reactions"", ""total_count"": 1, ""+1"": 1, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,,13221727,issue 868976909,MDU6SXNzdWU4Njg5NzY5MDk=,5224,xarray can't append to Zarrs with byte-string variables,9010180,open,0,,,3,2021-04-27T15:30:44Z,2021-04-28T17:15:43Z,,NONE,,,," **What happened**: I tried to use xarray to append a Dataset to a Zarr containing a `|S1` ([char string](https://numpy.org/doc/stable/reference/arrays.dtypes.html#specifying-and-constructing-data-types)) datatype, and received this error: > ValueError: Invalid dtype for data variable: array(b'', dtype='|S1') dtype must be a subtype of number, datetime, bool, a fixed sized string, a fixed size unicode string or an object **What you expected to happen**: I expected the Dataset to be appended to the Zarr. **Minimal Complete Verifiable Example**: Note: this is not quite ""minimal"", since it also performs the append using the Zarr library directly and using a `|U1` (Unicode) datatype in order to demonstrate that these variations work. ```python import numpy as np import xarray as xr import zarr def test_append(data_type, zarr_path): print(f""Creating {data_type} Zarr..."") ds = xr.Dataset({""x"": np.array("""", dtype=data_type)}) ds.to_zarr(zarr_path, mode=""w"") print(f""Appending to {data_type} Zarr with Zarr library..."") zarr_to_append = zarr.open(zarr_path, mode=""a"") zarr_to_append.x.append(np.array("""", dtype=data_type)) print(f""Appending to {data_type} Zarr with xarray..."") ds_to_append = xr.Dataset({""x"": np.array("""", dtype=data_type)}) ds_to_append.to_zarr(zarr_path, mode=""a"") test_append(""|U1"", ""test-u.zarr"") test_append(""|S1"", ""test-s.zarr"") ``` **Anything else we need to know?**: I came across this problem when converting some NetCDFs from [this dataset](https://land.copernicus.eu/global/products/lwq) to a Zarr, appending them along the time axis. The latest data format vesion (1.4) includes a dimensionless variable `crs` with type `char`, which xarray reads as an `|S1`, causing the error described above when I attempt to append. Replacing `crs` with a `|U1`-typed variable works around the problem, but is undesirable since we need to reproduce the NetCDFs as closely as possible. The example above shows that the Zarr format and library themselves don't seem to have a problem with appending byte string variables. The obvious fix would be to loosen the type check in `xarray.backends.api._validate_datatypes_for_zarr_append`: ```python if ( not np.issubdtype(var.dtype, np.number) and not np.issubdtype(var.dtype, np.datetime64) and not np.issubdtype(var.dtype, np.bool_) and not coding.strings.is_unicode_dtype(var.dtype) and not coding.strings.is_bytes_dtype(var.dtype) # <- this line added to avoid ""Invalid dtype"" error and not var.dtype == object ): ``` This change makes the example above work, but I don't know if it would result in any unintended side-effects. **Environment**:
Output of xr.show_versions() INSTALLED VERSIONS ------------------ commit: 0021cdab91f7466f4be0fb32dae92bf3f8290e19 python: 3.8.5 (default, Jan 27 2021, 15:41:15) [GCC 9.3.0] python-bits: 64 OS: Linux OS-release: 5.8.0-50-generic machine: x86_64 processor: x86_64 byteorder: little LC_ALL: None LANG: en_GB.UTF-8 LOCALE: en_GB.UTF-8 libhdf5: 1.10.4 libnetcdf: 4.7.3 xarray: 0.15.0 pandas: 0.25.3 numpy: 1.17.4 scipy: 1.3.3 netCDF4: 1.5.3 pydap: None h5netcdf: 0.7.1 h5py: 2.10.0 Nio: None zarr: 2.4.0+ds cftime: 1.1.0 nc_time_axis: None PseudoNetCDF: None rasterio: 1.1.3 cfgrib: None iris: None bottleneck: 1.2.1 dask: 2.8.1+dfsg distributed: None matplotlib: 3.1.2 cartopy: None seaborn: None numbagg: None pint: None setuptools: 45.2.0 pip: 20.0.2 conda: None pytest: 4.6.9 IPython: 7.13.0 sphinx: 1.8.5
","{""url"": ""https://api.github.com/repos/pydata/xarray/issues/5224/reactions"", ""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,,13221727,issue