html_url,issue_url,id,node_id,user,created_at,updated_at,author_association,body,reactions,performed_via_github_app,issue
https://github.com/pydata/xarray/pull/5879#issuecomment-1085150420,https://api.github.com/repos/pydata/xarray/issues/5879,1085150420,IC_kwDOAMm_X85ArhTU,3309802,2022-03-31T21:41:32Z,2022-03-31T21:41:32Z,NONE,"Yeah, I guess I expected `OpenFile` to, well, act like an open file. So maybe this is more of an fsspec interface issue?
I'll open a separate issue for improving the UX of this in xarray though. I think this would be rather confusing for new users.","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,1031275532
https://github.com/pydata/xarray/pull/5879#issuecomment-1085077801,https://api.github.com/repos/pydata/xarray/issues/5879,1085077801,IC_kwDOAMm_X85ArPkp,3309802,2022-03-31T20:34:51Z,2022-03-31T20:34:51Z,NONE,"> ""s3://noaa-nwm-retrospective-2-1-zarr-pds/lakeout.zarr"" is a directory, right? You cannot open that as a file
Yeah correct. I oversimplified this from the problem I actually cared about, since of course zarr is not a single file that can be `fsspec.open`'d in the first place, and the zarr engine is doing some magic there when passed the plain string.
Here's a more illustrative example:
```python
In [1]: import xarray as xr
In [2]: import fsspec
In [3]: import os
In [4]: url = ""s3://noaa-nwm-retrospective-2-1-pds/model_output/1979/197902010100.CHRTOUT_DOMAIN1.comp"" # a netCDF file in s3
In [5]: f = fsspec.open(url)
In [6]: f
Out[6]:
In [7]: isinstance(f, os.PathLike)
Out[7]: True
In [8]: s3f = f.open()
In [9]: s3f
Out[9]:
In [10]: isinstance(s3f, os.PathLike)
Out[10]: False
In [11]: ds = xr.open_dataset(s3f, engine='h5netcdf')
In [12]: ds
Out[12]:
Dimensions: (time: 1, reference_time: 1, feature_id: 2776738)
Coordinates:
* time (time) datetime64[ns] 1979-02-01T01:00:00
* reference_time (reference_time) datetime64[ns] 1979-02-01
* feature_id (feature_id) int32 101 179 181 ... 1180001803 1180001804
latitude (feature_id) float32 ...
longitude (feature_id) float32 ...
Data variables:
crs |S1 ...
order (feature_id) int32 ...
elevation (feature_id) float32 ...
streamflow (feature_id) float64 ...
q_lateral (feature_id) float64 ...
velocity (feature_id) float64 ...
qSfcLatRunoff (feature_id) float64 ...
qBucket (feature_id) float64 ...
qBtmVertRunoff (feature_id) float64 ...
Attributes: (12/18)
TITLE: OUTPUT FROM WRF-Hydro v5.2.0-beta2
featureType: timeSeries
proj4: +proj=lcc +units=m +a=6370000.0 +b=6370000.0 ...
model_initialization_time: 1979-02-01_00:00:00
station_dimension: feature_id
model_output_valid_time: 1979-02-01_01:00:00
... ...
model_configuration: retrospective
dev_OVRTSWCRT: 1
dev_NOAH_TIMESTEP: 3600
dev_channel_only: 0
dev_channelBucket_only: 0
dev: dev_ prefix indicates development/internal me...
In [13]: ds = xr.open_dataset(f, engine='h5netcdf')
---------------------------------------------------------------------------
AttributeError Traceback (most recent call last)
in
----> 1 ds = xr.open_dataset(f, engine='h5netcdf')
~/dev/dask-playground/env/lib/python3.9/site-packages/xarray/backends/api.py in open_dataset(filename_or_obj, engine, chunks, cache, decode_cf, mask_and_scale, decode_times, decode_timedelta, use_cftime, concat_characters, decode_coords, drop_variables, backend_kwargs, *args, **kwargs)
493
494 overwrite_encoded_chunks = kwargs.pop(""overwrite_encoded_chunks"", None)
--> 495 backend_ds = backend.open_dataset(
496 filename_or_obj,
497 drop_variables=drop_variables,
~/dev/dask-playground/env/lib/python3.9/site-packages/xarray/backends/h5netcdf_.py in open_dataset(self, filename_or_obj, mask_and_scale, decode_times, concat_characters, decode_coords, drop_variables, use_cftime, decode_timedelta, format, group, lock, invalid_netcdf, phony_dims, decode_vlen_strings)
384 ):
385
--> 386 filename_or_obj = _normalize_path(filename_or_obj)
387 store = H5NetCDFStore.open(
388 filename_or_obj,
~/dev/dask-playground/env/lib/python3.9/site-packages/xarray/backends/common.py in _normalize_path(path)
21 def _normalize_path(path):
22 if isinstance(path, os.PathLike):
---> 23 path = os.fspath(path)
24
25 if isinstance(path, str) and not is_remote_uri(path):
~/dev/dask-playground/env/lib/python3.9/site-packages/fsspec/core.py in __fspath__(self)
96 def __fspath__(self):
97 # may raise if cannot be resolved to local file
---> 98 return self.open().__fspath__()
99
100 def __enter__(self):
AttributeError: 'S3File' object has no attribute '__fspath__'
```
Because the plain `fsspec.OpenFile` object has an `__fspath__` attribute (but calling it raises an error), it causes `xarray.backends.common._normalize_path` to fail.
Because the `s3fs.S3File` object does _not_ have an `__fspath__` attribute, `normalize_path` doesn't try to call `os.fspath` on it, so the file-like object is able to be passed all the way down into h5netcdf, which is able to handle it.
Note though that if I downgrade xarray to 0.19.0 (last version before this PR was merged), I still can't use the plain `fssspec.OpenFile` object successfully. It's not xarray's fault anymore—it gets passed all the way into h5netcdf—but h5netcdf also tries to call `fspath` on the `OpenFile`, which fails in the same way.
```python
In [1]: import xarray as xr
In [2]: import fsspec
In [3]: xr.__version__
Out[3]: '0.19.0'
In [4]: url = ""s3://noaa-nwm-retrospective-2-1-pds/model_output/1979/197902010100.CHRTOUT_DOMAIN1.comp"" # a netCDF file in s3
In [5]: f = fsspec.open(url)
In [6]: xr.open_dataset(f.open(), engine=""h5netcdf"")
Out[6]:
Dimensions: (time: 1, reference_time: 1, feature_id: 2776738)
Coordinates:
* time (time) datetime64[ns] 1979-02-01T01:00:00
* reference_time (reference_time) datetime64[ns] 1979-02-01
* feature_id (feature_id) int32 101 179 181 ... 1180001803 1180001804
latitude (feature_id) float32 ...
longitude (feature_id) float32 ...
Data variables:
crs |S1 ...
order (feature_id) int32 ...
elevation (feature_id) float32 ...
streamflow (feature_id) float64 ...
q_lateral (feature_id) float64 ...
velocity (feature_id) float64 ...
qSfcLatRunoff (feature_id) float64 ...
qBucket (feature_id) float64 ...
qBtmVertRunoff (feature_id) float64 ...
Attributes: (12/18)
TITLE: OUTPUT FROM WRF-Hydro v5.2.0-beta2
featureType: timeSeries
proj4: +proj=lcc +units=m +a=6370000.0 +b=6370000.0 ...
model_initialization_time: 1979-02-01_00:00:00
station_dimension: feature_id
model_output_valid_time: 1979-02-01_01:00:00
... ...
model_configuration: retrospective
dev_OVRTSWCRT: 1
dev_NOAH_TIMESTEP: 3600
dev_channel_only: 0
dev_channelBucket_only: 0
dev: dev_ prefix indicates development/internal me...
In [7]: xr.open_dataset(f, engine=""h5netcdf"")
---------------------------------------------------------------------------
KeyError Traceback (most recent call last)
~/dev/dask-playground/env/lib/python3.9/site-packages/xarray/backends/file_manager.py in _acquire_with_cache_info(self, needs_lock)
198 try:
--> 199 file = self._cache[self._key]
200 except KeyError:
~/dev/dask-playground/env/lib/python3.9/site-packages/xarray/backends/lru_cache.py in __getitem__(self, key)
52 with self._lock:
---> 53 value = self._cache[key]
54 self._cache.move_to_end(key)
KeyError: [, (,), 'r', (('decode_vlen_strings', True), ('invalid_netcdf', None))]
During handling of the above exception, another exception occurred:
AttributeError Traceback (most recent call last)
in
----> 1 xr.open_dataset(f, engine=""h5netcdf"")
~/dev/dask-playground/env/lib/python3.9/site-packages/xarray/backends/api.py in open_dataset(filename_or_obj, engine, chunks, cache, decode_cf, mask_and_scale, decode_times, decode_timedelta, use_cftime, concat_characters, decode_coords, drop_variables, backend_kwargs, *args, **kwargs)
495
496 overwrite_encoded_chunks = kwargs.pop(""overwrite_encoded_chunks"", None)
--> 497 backend_ds = backend.open_dataset(
498 filename_or_obj,
499 drop_variables=drop_variables,
~/dev/dask-playground/env/lib/python3.9/site-packages/xarray/backends/h5netcdf_.py in open_dataset(self, filename_or_obj, mask_and_scale, decode_times, concat_characters, decode_coords, drop_variables, use_cftime, decode_timedelta, format, group, lock, invalid_netcdf, phony_dims, decode_vlen_strings)
372
373 filename_or_obj = _normalize_path(filename_or_obj)
--> 374 store = H5NetCDFStore.open(
375 filename_or_obj,
376 format=format,
~/dev/dask-playground/env/lib/python3.9/site-packages/xarray/backends/h5netcdf_.py in open(cls, filename, mode, format, group, lock, autoclose, invalid_netcdf, phony_dims, decode_vlen_strings)
176
177 manager = CachingFileManager(h5netcdf.File, filename, mode=mode, kwargs=kwargs)
--> 178 return cls(manager, group=group, mode=mode, lock=lock, autoclose=autoclose)
179
180 def _acquire(self, needs_lock=True):
~/dev/dask-playground/env/lib/python3.9/site-packages/xarray/backends/h5netcdf_.py in __init__(self, manager, group, mode, lock, autoclose)
121 # todo: utilizing find_root_and_group seems a bit clunky
122 # making filename available on h5netcdf.Group seems better
--> 123 self._filename = find_root_and_group(self.ds)[0].filename
124 self.is_remote = is_remote_uri(self._filename)
125 self.lock = ensure_lock(lock)
~/dev/dask-playground/env/lib/python3.9/site-packages/xarray/backends/h5netcdf_.py in ds(self)
187 @property
188 def ds(self):
--> 189 return self._acquire()
190
191 def open_store_variable(self, name, var):
~/dev/dask-playground/env/lib/python3.9/site-packages/xarray/backends/h5netcdf_.py in _acquire(self, needs_lock)
179
180 def _acquire(self, needs_lock=True):
--> 181 with self._manager.acquire_context(needs_lock) as root:
182 ds = _nc4_require_group(
183 root, self._group, self._mode, create_group=_h5netcdf_create_group
~/.pyenv/versions/3.9.1/lib/python3.9/contextlib.py in __enter__(self)
115 del self.args, self.kwds, self.func
116 try:
--> 117 return next(self.gen)
118 except StopIteration:
119 raise RuntimeError(""generator didn't yield"") from None
~/dev/dask-playground/env/lib/python3.9/site-packages/xarray/backends/file_manager.py in acquire_context(self, needs_lock)
185 def acquire_context(self, needs_lock=True):
186 """"""Context manager for acquiring a file.""""""
--> 187 file, cached = self._acquire_with_cache_info(needs_lock)
188 try:
189 yield file
~/dev/dask-playground/env/lib/python3.9/site-packages/xarray/backends/file_manager.py in _acquire_with_cache_info(self, needs_lock)
203 kwargs = kwargs.copy()
204 kwargs[""mode""] = self._mode
--> 205 file = self._opener(*self._args, **kwargs)
206 if self._mode == ""w"":
207 # ensure file doesn't get overriden when opened again
~/dev/dask-playground/env/lib/python3.9/site-packages/h5netcdf/core.py in __init__(self, path, mode, invalid_netcdf, phony_dims, **kwargs)
978 self._preexisting_file = mode in {""r"", ""r+"", ""a""}
979 self._h5py = h5py
--> 980 self._h5file = self._h5py.File(
981 path, mode, track_order=track_order, **kwargs
982 )
~/dev/dask-playground/env/lib/python3.9/site-packages/h5py/_hl/files.py in __init__(self, name, mode, driver, libver, userblock_size, swmr, rdcc_nslots, rdcc_nbytes, rdcc_w0, track_order, fs_strategy, fs_persist, fs_threshold, fs_page_size, page_buf_size, min_meta_keep, min_raw_keep, locking, **kwds)
484 name = repr(name).encode('ASCII', 'replace')
485 else:
--> 486 name = filename_encode(name)
487
488 if track_order is None:
~/dev/dask-playground/env/lib/python3.9/site-packages/h5py/_hl/compat.py in filename_encode(filename)
17 filenames in h5py for more information.
18 """"""
---> 19 filename = fspath(filename)
20 if sys.platform == ""win32"":
21 if isinstance(filename, str):
~/dev/dask-playground/env/lib/python3.9/site-packages/fsspec/core.py in __fspath__(self)
96 def __fspath__(self):
97 # may raise if cannot be resolved to local file
---> 98 return self.open().__fspath__()
99
100 def __enter__(self):
AttributeError: 'S3File' object has no attribute '__fspath__'
```
The problem is that `OpenFile` doesn't have a `read` or `seek` method, so h5py doesn't think it's a proper file-like object and tries to `fspath` it here: https://github.com/h5py/h5py/blob/master/h5py/_hl/files.py#L509
So I may just be misunderstanding what an `fsspec.OpenFile` object is supposed to be (it's not actually a file-like object until you `.open()` it?). But I expect users would be similarly confused by this distinction.","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,1031275532
https://github.com/pydata/xarray/pull/5879#issuecomment-1085030197,https://api.github.com/repos/pydata/xarray/issues/5879,1085030197,IC_kwDOAMm_X85ArD81,3309802,2022-03-31T19:46:40Z,2022-03-31T19:50:07Z,NONE,"@martindurant exactly, `os.PathLike` just uses duck-typing, which fsspec matches.
This generally means you can't pass s3fs/gcsfs files into `xr.open_dataset` (from what I've tried so far). (I don't know if you actually should be able to do this, but regardless, the error would be very confusing to a new user.)
```python
In [32]: xr.open_dataset(""s3://noaa-nwm-retrospective-2-1-zarr-pds/lakeout.zarr"", engine=""zarr"")
Out[32]:
Dimensions: (feature_id: 5783, time: 367439)
Coordinates:
* feature_id (feature_id) int32 491 531 747 ... 947070204 1021092845
latitude (feature_id) float32 ...
longitude (feature_id) float32 ...
* time (time) datetime64[ns] 1979-02-01T01:00:00 ... 2020-12-31T...
Data variables:
crs |S1 ...
inflow (time, feature_id) float64 ...
outflow (time, feature_id) float64 ...
water_sfc_elev (time, feature_id) float32 ...
Attributes:
Conventions: CF-1.6
TITLE: OUTPUT FROM WRF-Hydro v5.2.0-beta2
code_version: v5.2.0-beta2
featureType: timeSeries
model_configuration: retrospective
model_output_type: reservoir
proj4: +proj=lcc +units=m +a=6370000.0 +b=6370000....
reservoir_assimilated_value: Assimilation not performed
reservoir_type: 1 = level pool everywhere
station_dimension: lake_id
In [33]: xr.open_dataset(fsspec.open(""s3://noaa-nwm-retrospective-2-1-zarr-pds/lakeout.zarr""), engine=""zarr"")
---------------------------------------------------------------------------
KeyError Traceback (most recent call last)
in
----> 1 xr.open_dataset(fsspec.open(""s3://noaa-nwm-retrospective-2-1-zarr-pds/lakeout.zarr""), engine=""zarr"")
~/dev/dask-playground/env/lib/python3.9/site-packages/xarray/backends/api.py in open_dataset(filename_or_obj, engine, chunks, cache, decode_cf, mask_and_scale, decode_times, decode_timedelta, use_cftime, concat_characters, decode_coords, drop_variables, backend_kwargs, *args, **kwargs)
493
494 overwrite_encoded_chunks = kwargs.pop(""overwrite_encoded_chunks"", None)
--> 495 backend_ds = backend.open_dataset(
496 filename_or_obj,
497 drop_variables=drop_variables,
~/dev/dask-playground/env/lib/python3.9/site-packages/xarray/backends/zarr.py in open_dataset(self, filename_or_obj, mask_and_scale, decode_times, concat_characters, decode_coords, drop_variables, use_cftime, decode_timedelta, group, mode, synchronizer, consolidated, chunk_store, storage_options, stacklevel)
797 ):
798
--> 799 filename_or_obj = _normalize_path(filename_or_obj)
800 store = ZarrStore.open_group(
801 filename_or_obj,
~/dev/dask-playground/env/lib/python3.9/site-packages/xarray/backends/common.py in _normalize_path(path)
21 def _normalize_path(path):
22 if isinstance(path, os.PathLike):
---> 23 path = os.fspath(path)
24
25 if isinstance(path, str) and not is_remote_uri(path):
~/dev/dask-playground/env/lib/python3.9/site-packages/fsspec/core.py in __fspath__(self)
96 def __fspath__(self):
97 # may raise if cannot be resolved to local file
---> 98 return self.open().__fspath__()
99
100 def __enter__(self):
~/dev/dask-playground/env/lib/python3.9/site-packages/fsspec/core.py in open(self)
138 been deleted; but a with-context is better style.
139 """"""
--> 140 out = self.__enter__()
141 closer = out.close
142 fobjects = self.fobjects.copy()[:-1]
~/dev/dask-playground/env/lib/python3.9/site-packages/fsspec/core.py in __enter__(self)
101 mode = self.mode.replace(""t"", """").replace(""b"", """") + ""b""
102
--> 103 f = self.fs.open(self.path, mode=mode)
104
105 self.fobjects = [f]
~/dev/dask-playground/env/lib/python3.9/site-packages/fsspec/spec.py in open(self, path, mode, block_size, cache_options, compression, **kwargs)
1007 else:
1008 ac = kwargs.pop(""autocommit"", not self._intrans)
-> 1009 f = self._open(
1010 path,
1011 mode=mode,
~/dev/dask-playground/env/lib/python3.9/site-packages/s3fs/core.py in _open(self, path, mode, block_size, acl, version_id, fill_cache, cache_type, autocommit, requester_pays, **kwargs)
532 cache_type = self.default_cache_type
533
--> 534 return S3File(
535 self,
536 path,
~/dev/dask-playground/env/lib/python3.9/site-packages/s3fs/core.py in __init__(self, s3, path, mode, block_size, acl, version_id, fill_cache, s3_additional_kwargs, autocommit, cache_type, requester_pays)
1824
1825 if ""r"" in mode:
-> 1826 self.req_kw[""IfMatch""] = self.details[""ETag""]
1827
1828 def _call_s3(self, method, *kwarglist, **kwargs):
KeyError: 'ETag'
```
","{""total_count"": 1, ""+1"": 1, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,1031275532
https://github.com/pydata/xarray/pull/5879#issuecomment-1085012805,https://api.github.com/repos/pydata/xarray/issues/5879,1085012805,IC_kwDOAMm_X85Aq_tF,3309802,2022-03-31T19:25:28Z,2022-03-31T19:25:28Z,NONE,"Note that `isinstance(fsspec.OpenFile(...), os.PathLike)` due to the magic of ABCs. Are we sure that we want to be calling `os.fspath` on fsspec files? In many cases (like an S3File, GCSFile, etc.) this will fail with a confusing error like `'S3File' object has no attribute '__fspath__'`.
cc @martindurant ","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,1031275532