html_url,issue_url,id,node_id,user,created_at,updated_at,author_association,body,reactions,performed_via_github_app,issue https://github.com/pydata/xarray/pull/5879#issuecomment-1085150420,https://api.github.com/repos/pydata/xarray/issues/5879,1085150420,IC_kwDOAMm_X85ArhTU,3309802,2022-03-31T21:41:32Z,2022-03-31T21:41:32Z,NONE,"Yeah, I guess I expected `OpenFile` to, well, act like an open file. So maybe this is more of an fsspec interface issue? I'll open a separate issue for improving the UX of this in xarray though. I think this would be rather confusing for new users.","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,1031275532 https://github.com/pydata/xarray/pull/5879#issuecomment-1085077801,https://api.github.com/repos/pydata/xarray/issues/5879,1085077801,IC_kwDOAMm_X85ArPkp,3309802,2022-03-31T20:34:51Z,2022-03-31T20:34:51Z,NONE,"> ""s3://noaa-nwm-retrospective-2-1-zarr-pds/lakeout.zarr"" is a directory, right? You cannot open that as a file Yeah correct. I oversimplified this from the problem I actually cared about, since of course zarr is not a single file that can be `fsspec.open`'d in the first place, and the zarr engine is doing some magic there when passed the plain string. Here's a more illustrative example: ```python In [1]: import xarray as xr In [2]: import fsspec In [3]: import os In [4]: url = ""s3://noaa-nwm-retrospective-2-1-pds/model_output/1979/197902010100.CHRTOUT_DOMAIN1.comp"" # a netCDF file in s3 In [5]: f = fsspec.open(url) In [6]: f Out[6]: In [7]: isinstance(f, os.PathLike) Out[7]: True In [8]: s3f = f.open() In [9]: s3f Out[9]: In [10]: isinstance(s3f, os.PathLike) Out[10]: False In [11]: ds = xr.open_dataset(s3f, engine='h5netcdf') In [12]: ds Out[12]: Dimensions: (time: 1, reference_time: 1, feature_id: 2776738) Coordinates: * time (time) datetime64[ns] 1979-02-01T01:00:00 * reference_time (reference_time) datetime64[ns] 1979-02-01 * feature_id (feature_id) int32 101 179 181 ... 1180001803 1180001804 latitude (feature_id) float32 ... longitude (feature_id) float32 ... Data variables: crs |S1 ... order (feature_id) int32 ... elevation (feature_id) float32 ... streamflow (feature_id) float64 ... q_lateral (feature_id) float64 ... velocity (feature_id) float64 ... qSfcLatRunoff (feature_id) float64 ... qBucket (feature_id) float64 ... qBtmVertRunoff (feature_id) float64 ... Attributes: (12/18) TITLE: OUTPUT FROM WRF-Hydro v5.2.0-beta2 featureType: timeSeries proj4: +proj=lcc +units=m +a=6370000.0 +b=6370000.0 ... model_initialization_time: 1979-02-01_00:00:00 station_dimension: feature_id model_output_valid_time: 1979-02-01_01:00:00 ... ... model_configuration: retrospective dev_OVRTSWCRT: 1 dev_NOAH_TIMESTEP: 3600 dev_channel_only: 0 dev_channelBucket_only: 0 dev: dev_ prefix indicates development/internal me... In [13]: ds = xr.open_dataset(f, engine='h5netcdf') --------------------------------------------------------------------------- AttributeError Traceback (most recent call last) in ----> 1 ds = xr.open_dataset(f, engine='h5netcdf') ~/dev/dask-playground/env/lib/python3.9/site-packages/xarray/backends/api.py in open_dataset(filename_or_obj, engine, chunks, cache, decode_cf, mask_and_scale, decode_times, decode_timedelta, use_cftime, concat_characters, decode_coords, drop_variables, backend_kwargs, *args, **kwargs) 493 494 overwrite_encoded_chunks = kwargs.pop(""overwrite_encoded_chunks"", None) --> 495 backend_ds = backend.open_dataset( 496 filename_or_obj, 497 drop_variables=drop_variables, ~/dev/dask-playground/env/lib/python3.9/site-packages/xarray/backends/h5netcdf_.py in open_dataset(self, filename_or_obj, mask_and_scale, decode_times, concat_characters, decode_coords, drop_variables, use_cftime, decode_timedelta, format, group, lock, invalid_netcdf, phony_dims, decode_vlen_strings) 384 ): 385 --> 386 filename_or_obj = _normalize_path(filename_or_obj) 387 store = H5NetCDFStore.open( 388 filename_or_obj, ~/dev/dask-playground/env/lib/python3.9/site-packages/xarray/backends/common.py in _normalize_path(path) 21 def _normalize_path(path): 22 if isinstance(path, os.PathLike): ---> 23 path = os.fspath(path) 24 25 if isinstance(path, str) and not is_remote_uri(path): ~/dev/dask-playground/env/lib/python3.9/site-packages/fsspec/core.py in __fspath__(self) 96 def __fspath__(self): 97 # may raise if cannot be resolved to local file ---> 98 return self.open().__fspath__() 99 100 def __enter__(self): AttributeError: 'S3File' object has no attribute '__fspath__' ``` Because the plain `fsspec.OpenFile` object has an `__fspath__` attribute (but calling it raises an error), it causes `xarray.backends.common._normalize_path` to fail. Because the `s3fs.S3File` object does _not_ have an `__fspath__` attribute, `normalize_path` doesn't try to call `os.fspath` on it, so the file-like object is able to be passed all the way down into h5netcdf, which is able to handle it.
Note though that if I downgrade xarray to 0.19.0 (last version before this PR was merged), I still can't use the plain `fssspec.OpenFile` object successfully. It's not xarray's fault anymore—it gets passed all the way into h5netcdf—but h5netcdf also tries to call `fspath` on the `OpenFile`, which fails in the same way. ```python In [1]: import xarray as xr In [2]: import fsspec In [3]: xr.__version__ Out[3]: '0.19.0' In [4]: url = ""s3://noaa-nwm-retrospective-2-1-pds/model_output/1979/197902010100.CHRTOUT_DOMAIN1.comp"" # a netCDF file in s3 In [5]: f = fsspec.open(url) In [6]: xr.open_dataset(f.open(), engine=""h5netcdf"") Out[6]: Dimensions: (time: 1, reference_time: 1, feature_id: 2776738) Coordinates: * time (time) datetime64[ns] 1979-02-01T01:00:00 * reference_time (reference_time) datetime64[ns] 1979-02-01 * feature_id (feature_id) int32 101 179 181 ... 1180001803 1180001804 latitude (feature_id) float32 ... longitude (feature_id) float32 ... Data variables: crs |S1 ... order (feature_id) int32 ... elevation (feature_id) float32 ... streamflow (feature_id) float64 ... q_lateral (feature_id) float64 ... velocity (feature_id) float64 ... qSfcLatRunoff (feature_id) float64 ... qBucket (feature_id) float64 ... qBtmVertRunoff (feature_id) float64 ... Attributes: (12/18) TITLE: OUTPUT FROM WRF-Hydro v5.2.0-beta2 featureType: timeSeries proj4: +proj=lcc +units=m +a=6370000.0 +b=6370000.0 ... model_initialization_time: 1979-02-01_00:00:00 station_dimension: feature_id model_output_valid_time: 1979-02-01_01:00:00 ... ... model_configuration: retrospective dev_OVRTSWCRT: 1 dev_NOAH_TIMESTEP: 3600 dev_channel_only: 0 dev_channelBucket_only: 0 dev: dev_ prefix indicates development/internal me... In [7]: xr.open_dataset(f, engine=""h5netcdf"") --------------------------------------------------------------------------- KeyError Traceback (most recent call last) ~/dev/dask-playground/env/lib/python3.9/site-packages/xarray/backends/file_manager.py in _acquire_with_cache_info(self, needs_lock) 198 try: --> 199 file = self._cache[self._key] 200 except KeyError: ~/dev/dask-playground/env/lib/python3.9/site-packages/xarray/backends/lru_cache.py in __getitem__(self, key) 52 with self._lock: ---> 53 value = self._cache[key] 54 self._cache.move_to_end(key) KeyError: [, (,), 'r', (('decode_vlen_strings', True), ('invalid_netcdf', None))] During handling of the above exception, another exception occurred: AttributeError Traceback (most recent call last) in ----> 1 xr.open_dataset(f, engine=""h5netcdf"") ~/dev/dask-playground/env/lib/python3.9/site-packages/xarray/backends/api.py in open_dataset(filename_or_obj, engine, chunks, cache, decode_cf, mask_and_scale, decode_times, decode_timedelta, use_cftime, concat_characters, decode_coords, drop_variables, backend_kwargs, *args, **kwargs) 495 496 overwrite_encoded_chunks = kwargs.pop(""overwrite_encoded_chunks"", None) --> 497 backend_ds = backend.open_dataset( 498 filename_or_obj, 499 drop_variables=drop_variables, ~/dev/dask-playground/env/lib/python3.9/site-packages/xarray/backends/h5netcdf_.py in open_dataset(self, filename_or_obj, mask_and_scale, decode_times, concat_characters, decode_coords, drop_variables, use_cftime, decode_timedelta, format, group, lock, invalid_netcdf, phony_dims, decode_vlen_strings) 372 373 filename_or_obj = _normalize_path(filename_or_obj) --> 374 store = H5NetCDFStore.open( 375 filename_or_obj, 376 format=format, ~/dev/dask-playground/env/lib/python3.9/site-packages/xarray/backends/h5netcdf_.py in open(cls, filename, mode, format, group, lock, autoclose, invalid_netcdf, phony_dims, decode_vlen_strings) 176 177 manager = CachingFileManager(h5netcdf.File, filename, mode=mode, kwargs=kwargs) --> 178 return cls(manager, group=group, mode=mode, lock=lock, autoclose=autoclose) 179 180 def _acquire(self, needs_lock=True): ~/dev/dask-playground/env/lib/python3.9/site-packages/xarray/backends/h5netcdf_.py in __init__(self, manager, group, mode, lock, autoclose) 121 # todo: utilizing find_root_and_group seems a bit clunky 122 # making filename available on h5netcdf.Group seems better --> 123 self._filename = find_root_and_group(self.ds)[0].filename 124 self.is_remote = is_remote_uri(self._filename) 125 self.lock = ensure_lock(lock) ~/dev/dask-playground/env/lib/python3.9/site-packages/xarray/backends/h5netcdf_.py in ds(self) 187 @property 188 def ds(self): --> 189 return self._acquire() 190 191 def open_store_variable(self, name, var): ~/dev/dask-playground/env/lib/python3.9/site-packages/xarray/backends/h5netcdf_.py in _acquire(self, needs_lock) 179 180 def _acquire(self, needs_lock=True): --> 181 with self._manager.acquire_context(needs_lock) as root: 182 ds = _nc4_require_group( 183 root, self._group, self._mode, create_group=_h5netcdf_create_group ~/.pyenv/versions/3.9.1/lib/python3.9/contextlib.py in __enter__(self) 115 del self.args, self.kwds, self.func 116 try: --> 117 return next(self.gen) 118 except StopIteration: 119 raise RuntimeError(""generator didn't yield"") from None ~/dev/dask-playground/env/lib/python3.9/site-packages/xarray/backends/file_manager.py in acquire_context(self, needs_lock) 185 def acquire_context(self, needs_lock=True): 186 """"""Context manager for acquiring a file."""""" --> 187 file, cached = self._acquire_with_cache_info(needs_lock) 188 try: 189 yield file ~/dev/dask-playground/env/lib/python3.9/site-packages/xarray/backends/file_manager.py in _acquire_with_cache_info(self, needs_lock) 203 kwargs = kwargs.copy() 204 kwargs[""mode""] = self._mode --> 205 file = self._opener(*self._args, **kwargs) 206 if self._mode == ""w"": 207 # ensure file doesn't get overriden when opened again ~/dev/dask-playground/env/lib/python3.9/site-packages/h5netcdf/core.py in __init__(self, path, mode, invalid_netcdf, phony_dims, **kwargs) 978 self._preexisting_file = mode in {""r"", ""r+"", ""a""} 979 self._h5py = h5py --> 980 self._h5file = self._h5py.File( 981 path, mode, track_order=track_order, **kwargs 982 ) ~/dev/dask-playground/env/lib/python3.9/site-packages/h5py/_hl/files.py in __init__(self, name, mode, driver, libver, userblock_size, swmr, rdcc_nslots, rdcc_nbytes, rdcc_w0, track_order, fs_strategy, fs_persist, fs_threshold, fs_page_size, page_buf_size, min_meta_keep, min_raw_keep, locking, **kwds) 484 name = repr(name).encode('ASCII', 'replace') 485 else: --> 486 name = filename_encode(name) 487 488 if track_order is None: ~/dev/dask-playground/env/lib/python3.9/site-packages/h5py/_hl/compat.py in filename_encode(filename) 17 filenames in h5py for more information. 18 """""" ---> 19 filename = fspath(filename) 20 if sys.platform == ""win32"": 21 if isinstance(filename, str): ~/dev/dask-playground/env/lib/python3.9/site-packages/fsspec/core.py in __fspath__(self) 96 def __fspath__(self): 97 # may raise if cannot be resolved to local file ---> 98 return self.open().__fspath__() 99 100 def __enter__(self): AttributeError: 'S3File' object has no attribute '__fspath__' ``` The problem is that `OpenFile` doesn't have a `read` or `seek` method, so h5py doesn't think it's a proper file-like object and tries to `fspath` it here: https://github.com/h5py/h5py/blob/master/h5py/_hl/files.py#L509
So I may just be misunderstanding what an `fsspec.OpenFile` object is supposed to be (it's not actually a file-like object until you `.open()` it?). But I expect users would be similarly confused by this distinction.","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,1031275532 https://github.com/pydata/xarray/pull/5879#issuecomment-1085030197,https://api.github.com/repos/pydata/xarray/issues/5879,1085030197,IC_kwDOAMm_X85ArD81,3309802,2022-03-31T19:46:40Z,2022-03-31T19:50:07Z,NONE,"@martindurant exactly, `os.PathLike` just uses duck-typing, which fsspec matches. This generally means you can't pass s3fs/gcsfs files into `xr.open_dataset` (from what I've tried so far). (I don't know if you actually should be able to do this, but regardless, the error would be very confusing to a new user.) ```python In [32]: xr.open_dataset(""s3://noaa-nwm-retrospective-2-1-zarr-pds/lakeout.zarr"", engine=""zarr"") Out[32]: Dimensions: (feature_id: 5783, time: 367439) Coordinates: * feature_id (feature_id) int32 491 531 747 ... 947070204 1021092845 latitude (feature_id) float32 ... longitude (feature_id) float32 ... * time (time) datetime64[ns] 1979-02-01T01:00:00 ... 2020-12-31T... Data variables: crs |S1 ... inflow (time, feature_id) float64 ... outflow (time, feature_id) float64 ... water_sfc_elev (time, feature_id) float32 ... Attributes: Conventions: CF-1.6 TITLE: OUTPUT FROM WRF-Hydro v5.2.0-beta2 code_version: v5.2.0-beta2 featureType: timeSeries model_configuration: retrospective model_output_type: reservoir proj4: +proj=lcc +units=m +a=6370000.0 +b=6370000.... reservoir_assimilated_value: Assimilation not performed reservoir_type: 1 = level pool everywhere station_dimension: lake_id In [33]: xr.open_dataset(fsspec.open(""s3://noaa-nwm-retrospective-2-1-zarr-pds/lakeout.zarr""), engine=""zarr"") --------------------------------------------------------------------------- KeyError Traceback (most recent call last) in ----> 1 xr.open_dataset(fsspec.open(""s3://noaa-nwm-retrospective-2-1-zarr-pds/lakeout.zarr""), engine=""zarr"") ~/dev/dask-playground/env/lib/python3.9/site-packages/xarray/backends/api.py in open_dataset(filename_or_obj, engine, chunks, cache, decode_cf, mask_and_scale, decode_times, decode_timedelta, use_cftime, concat_characters, decode_coords, drop_variables, backend_kwargs, *args, **kwargs) 493 494 overwrite_encoded_chunks = kwargs.pop(""overwrite_encoded_chunks"", None) --> 495 backend_ds = backend.open_dataset( 496 filename_or_obj, 497 drop_variables=drop_variables, ~/dev/dask-playground/env/lib/python3.9/site-packages/xarray/backends/zarr.py in open_dataset(self, filename_or_obj, mask_and_scale, decode_times, concat_characters, decode_coords, drop_variables, use_cftime, decode_timedelta, group, mode, synchronizer, consolidated, chunk_store, storage_options, stacklevel) 797 ): 798 --> 799 filename_or_obj = _normalize_path(filename_or_obj) 800 store = ZarrStore.open_group( 801 filename_or_obj, ~/dev/dask-playground/env/lib/python3.9/site-packages/xarray/backends/common.py in _normalize_path(path) 21 def _normalize_path(path): 22 if isinstance(path, os.PathLike): ---> 23 path = os.fspath(path) 24 25 if isinstance(path, str) and not is_remote_uri(path): ~/dev/dask-playground/env/lib/python3.9/site-packages/fsspec/core.py in __fspath__(self) 96 def __fspath__(self): 97 # may raise if cannot be resolved to local file ---> 98 return self.open().__fspath__() 99 100 def __enter__(self): ~/dev/dask-playground/env/lib/python3.9/site-packages/fsspec/core.py in open(self) 138 been deleted; but a with-context is better style. 139 """""" --> 140 out = self.__enter__() 141 closer = out.close 142 fobjects = self.fobjects.copy()[:-1] ~/dev/dask-playground/env/lib/python3.9/site-packages/fsspec/core.py in __enter__(self) 101 mode = self.mode.replace(""t"", """").replace(""b"", """") + ""b"" 102 --> 103 f = self.fs.open(self.path, mode=mode) 104 105 self.fobjects = [f] ~/dev/dask-playground/env/lib/python3.9/site-packages/fsspec/spec.py in open(self, path, mode, block_size, cache_options, compression, **kwargs) 1007 else: 1008 ac = kwargs.pop(""autocommit"", not self._intrans) -> 1009 f = self._open( 1010 path, 1011 mode=mode, ~/dev/dask-playground/env/lib/python3.9/site-packages/s3fs/core.py in _open(self, path, mode, block_size, acl, version_id, fill_cache, cache_type, autocommit, requester_pays, **kwargs) 532 cache_type = self.default_cache_type 533 --> 534 return S3File( 535 self, 536 path, ~/dev/dask-playground/env/lib/python3.9/site-packages/s3fs/core.py in __init__(self, s3, path, mode, block_size, acl, version_id, fill_cache, s3_additional_kwargs, autocommit, cache_type, requester_pays) 1824 1825 if ""r"" in mode: -> 1826 self.req_kw[""IfMatch""] = self.details[""ETag""] 1827 1828 def _call_s3(self, method, *kwarglist, **kwargs): KeyError: 'ETag' ``` ","{""total_count"": 1, ""+1"": 1, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,1031275532 https://github.com/pydata/xarray/pull/5879#issuecomment-1085012805,https://api.github.com/repos/pydata/xarray/issues/5879,1085012805,IC_kwDOAMm_X85Aq_tF,3309802,2022-03-31T19:25:28Z,2022-03-31T19:25:28Z,NONE,"Note that `isinstance(fsspec.OpenFile(...), os.PathLike)` due to the magic of ABCs. Are we sure that we want to be calling `os.fspath` on fsspec files? In many cases (like an S3File, GCSFile, etc.) this will fail with a confusing error like `'S3File' object has no attribute '__fspath__'`. cc @martindurant ","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,1031275532