home / github / issue_comments

Menu
  • Search all tables
  • GraphQL API

issue_comments: 1085077801

This data as json

html_url issue_url id node_id user created_at updated_at author_association body reactions performed_via_github_app issue
https://github.com/pydata/xarray/pull/5879#issuecomment-1085077801 https://api.github.com/repos/pydata/xarray/issues/5879 1085077801 IC_kwDOAMm_X85ArPkp 3309802 2022-03-31T20:34:51Z 2022-03-31T20:34:51Z NONE

"s3://noaa-nwm-retrospective-2-1-zarr-pds/lakeout.zarr" is a directory, right? You cannot open that as a file

Yeah correct. I oversimplified this from the problem I actually cared about, since of course zarr is not a single file that can be fsspec.open'd in the first place, and the zarr engine is doing some magic there when passed the plain string.

Here's a more illustrative example:

```python In [1]: import xarray as xr

In [2]: import fsspec

In [3]: import os

In [4]: url = "s3://noaa-nwm-retrospective-2-1-pds/model_output/1979/197902010100.CHRTOUT_DOMAIN1.comp" # a netCDF file in s3

In [5]: f = fsspec.open(url)

In [6]: f Out[6]: <OpenFile 'noaa-nwm-retrospective-2-1-pds/model_output/1979/197902010100.CHRTOUT_DOMAIN1.comp'>

In [7]: isinstance(f, os.PathLike) Out[7]: True

In [8]: s3f = f.open()

In [9]: s3f Out[9]: <File-like object S3FileSystem, noaa-nwm-retrospective-2-1-pds/model_output/1979/197902010100.CHRTOUT_DOMAIN1.comp>

In [10]: isinstance(s3f, os.PathLike) Out[10]: False

In [11]: ds = xr.open_dataset(s3f, engine='h5netcdf')

In [12]: ds Out[12]: <xarray.Dataset> Dimensions: (time: 1, reference_time: 1, feature_id: 2776738) Coordinates: * time (time) datetime64[ns] 1979-02-01T01:00:00 * reference_time (reference_time) datetime64[ns] 1979-02-01 * feature_id (feature_id) int32 101 179 181 ... 1180001803 1180001804 latitude (feature_id) float32 ... longitude (feature_id) float32 ... Data variables: crs |S1 ... order (feature_id) int32 ... elevation (feature_id) float32 ... streamflow (feature_id) float64 ... q_lateral (feature_id) float64 ... velocity (feature_id) float64 ... qSfcLatRunoff (feature_id) float64 ... qBucket (feature_id) float64 ... qBtmVertRunoff (feature_id) float64 ... Attributes: (12/18) TITLE: OUTPUT FROM WRF-Hydro v5.2.0-beta2 featureType: timeSeries proj4: +proj=lcc +units=m +a=6370000.0 +b=6370000.0 ... model_initialization_time: 1979-02-01_00:00:00 station_dimension: feature_id model_output_valid_time: 1979-02-01_01:00:00 ... ... model_configuration: retrospective dev_OVRTSWCRT: 1 dev_NOAH_TIMESTEP: 3600 dev_channel_only: 0 dev_channelBucket_only: 0 dev: dev_ prefix indicates development/internal me...

In [13]: ds = xr.open_dataset(f, engine='h5netcdf')

AttributeError Traceback (most recent call last) <ipython-input-13-de834ca911b4> in <module> ----> 1 ds = xr.open_dataset(f, engine='h5netcdf')

~/dev/dask-playground/env/lib/python3.9/site-packages/xarray/backends/api.py in open_dataset(filename_or_obj, engine, chunks, cache, decode_cf, mask_and_scale, decode_times, decode_timedelta, use_cftime, concat_characters, decode_coords, drop_variables, backend_kwargs, args, *kwargs) 493 494 overwrite_encoded_chunks = kwargs.pop("overwrite_encoded_chunks", None) --> 495 backend_ds = backend.open_dataset( 496 filename_or_obj, 497 drop_variables=drop_variables,

~/dev/dask-playground/env/lib/python3.9/site-packages/xarray/backends/h5netcdf_.py in open_dataset(self, filename_or_obj, mask_and_scale, decode_times, concat_characters, decode_coords, drop_variables, use_cftime, decode_timedelta, format, group, lock, invalid_netcdf, phony_dims, decode_vlen_strings) 384 ): 385 --> 386 filename_or_obj = _normalize_path(filename_or_obj) 387 store = H5NetCDFStore.open( 388 filename_or_obj,

~/dev/dask-playground/env/lib/python3.9/site-packages/xarray/backends/common.py in _normalize_path(path) 21 def _normalize_path(path): 22 if isinstance(path, os.PathLike): ---> 23 path = os.fspath(path) 24 25 if isinstance(path, str) and not is_remote_uri(path):

~/dev/dask-playground/env/lib/python3.9/site-packages/fsspec/core.py in fspath(self) 96 def fspath(self): 97 # may raise if cannot be resolved to local file ---> 98 return self.open().fspath() 99 100 def enter(self):

AttributeError: 'S3File' object has no attribute 'fspath' ```

Because the plain fsspec.OpenFile object has an __fspath__ attribute (but calling it raises an error), it causes xarray.backends.common._normalize_path to fail.

Because the s3fs.S3File object does not have an __fspath__ attribute, normalize_path doesn't try to call os.fspath on it, so the file-like object is able to be passed all the way down into h5netcdf, which is able to handle it.

Note though that if I downgrade xarray to 0.19.0 (last version before this PR was merged), I still can't use the plain `fssspec.OpenFile` object successfully. It's not xarray's fault anymore—it gets passed all the way into h5netcdf—but h5netcdf also tries to call `fspath` on the `OpenFile`, which fails in the same way. ```python In [1]: import xarray as xr In [2]: import fsspec In [3]: xr.__version__ Out[3]: '0.19.0' In [4]: url = "s3://noaa-nwm-retrospective-2-1-pds/model_output/1979/197902010100.CHRTOUT_DOMAIN1.comp" # a netCDF file in s3 In [5]: f = fsspec.open(url) In [6]: xr.open_dataset(f.open(), engine="h5netcdf") Out[6]: <xarray.Dataset> Dimensions: (time: 1, reference_time: 1, feature_id: 2776738) Coordinates: * time (time) datetime64[ns] 1979-02-01T01:00:00 * reference_time (reference_time) datetime64[ns] 1979-02-01 * feature_id (feature_id) int32 101 179 181 ... 1180001803 1180001804 latitude (feature_id) float32 ... longitude (feature_id) float32 ... Data variables: crs |S1 ... order (feature_id) int32 ... elevation (feature_id) float32 ... streamflow (feature_id) float64 ... q_lateral (feature_id) float64 ... velocity (feature_id) float64 ... qSfcLatRunoff (feature_id) float64 ... qBucket (feature_id) float64 ... qBtmVertRunoff (feature_id) float64 ... Attributes: (12/18) TITLE: OUTPUT FROM WRF-Hydro v5.2.0-beta2 featureType: timeSeries proj4: +proj=lcc +units=m +a=6370000.0 +b=6370000.0 ... model_initialization_time: 1979-02-01_00:00:00 station_dimension: feature_id model_output_valid_time: 1979-02-01_01:00:00 ... ... model_configuration: retrospective dev_OVRTSWCRT: 1 dev_NOAH_TIMESTEP: 3600 dev_channel_only: 0 dev_channelBucket_only: 0 dev: dev_ prefix indicates development/internal me... In [7]: xr.open_dataset(f, engine="h5netcdf") --------------------------------------------------------------------------- KeyError Traceback (most recent call last) ~/dev/dask-playground/env/lib/python3.9/site-packages/xarray/backends/file_manager.py in _acquire_with_cache_info(self, needs_lock) 198 try: --> 199 file = self._cache[self._key] 200 except KeyError: ~/dev/dask-playground/env/lib/python3.9/site-packages/xarray/backends/lru_cache.py in __getitem__(self, key) 52 with self._lock: ---> 53 value = self._cache[key] 54 self._cache.move_to_end(key) KeyError: [<class 'h5netcdf.core.File'>, (<OpenFile 'noaa-nwm-retrospective-2-1-pds/model_output/1979/197902010100.CHRTOUT_DOMAIN1.comp'>,), 'r', (('decode_vlen_strings', True), ('invalid_netcdf', None))] During handling of the above exception, another exception occurred: AttributeError Traceback (most recent call last) <ipython-input-7-e6098b8ab402> in <module> ----> 1 xr.open_dataset(f, engine="h5netcdf") ~/dev/dask-playground/env/lib/python3.9/site-packages/xarray/backends/api.py in open_dataset(filename_or_obj, engine, chunks, cache, decode_cf, mask_and_scale, decode_times, decode_timedelta, use_cftime, concat_characters, decode_coords, drop_variables, backend_kwargs, *args, **kwargs) 495 496 overwrite_encoded_chunks = kwargs.pop("overwrite_encoded_chunks", None) --> 497 backend_ds = backend.open_dataset( 498 filename_or_obj, 499 drop_variables=drop_variables, ~/dev/dask-playground/env/lib/python3.9/site-packages/xarray/backends/h5netcdf_.py in open_dataset(self, filename_or_obj, mask_and_scale, decode_times, concat_characters, decode_coords, drop_variables, use_cftime, decode_timedelta, format, group, lock, invalid_netcdf, phony_dims, decode_vlen_strings) 372 373 filename_or_obj = _normalize_path(filename_or_obj) --> 374 store = H5NetCDFStore.open( 375 filename_or_obj, 376 format=format, ~/dev/dask-playground/env/lib/python3.9/site-packages/xarray/backends/h5netcdf_.py in open(cls, filename, mode, format, group, lock, autoclose, invalid_netcdf, phony_dims, decode_vlen_strings) 176 177 manager = CachingFileManager(h5netcdf.File, filename, mode=mode, kwargs=kwargs) --> 178 return cls(manager, group=group, mode=mode, lock=lock, autoclose=autoclose) 179 180 def _acquire(self, needs_lock=True): ~/dev/dask-playground/env/lib/python3.9/site-packages/xarray/backends/h5netcdf_.py in __init__(self, manager, group, mode, lock, autoclose) 121 # todo: utilizing find_root_and_group seems a bit clunky 122 # making filename available on h5netcdf.Group seems better --> 123 self._filename = find_root_and_group(self.ds)[0].filename 124 self.is_remote = is_remote_uri(self._filename) 125 self.lock = ensure_lock(lock) ~/dev/dask-playground/env/lib/python3.9/site-packages/xarray/backends/h5netcdf_.py in ds(self) 187 @property 188 def ds(self): --> 189 return self._acquire() 190 191 def open_store_variable(self, name, var): ~/dev/dask-playground/env/lib/python3.9/site-packages/xarray/backends/h5netcdf_.py in _acquire(self, needs_lock) 179 180 def _acquire(self, needs_lock=True): --> 181 with self._manager.acquire_context(needs_lock) as root: 182 ds = _nc4_require_group( 183 root, self._group, self._mode, create_group=_h5netcdf_create_group ~/.pyenv/versions/3.9.1/lib/python3.9/contextlib.py in __enter__(self) 115 del self.args, self.kwds, self.func 116 try: --> 117 return next(self.gen) 118 except StopIteration: 119 raise RuntimeError("generator didn't yield") from None ~/dev/dask-playground/env/lib/python3.9/site-packages/xarray/backends/file_manager.py in acquire_context(self, needs_lock) 185 def acquire_context(self, needs_lock=True): 186 """Context manager for acquiring a file.""" --> 187 file, cached = self._acquire_with_cache_info(needs_lock) 188 try: 189 yield file ~/dev/dask-playground/env/lib/python3.9/site-packages/xarray/backends/file_manager.py in _acquire_with_cache_info(self, needs_lock) 203 kwargs = kwargs.copy() 204 kwargs["mode"] = self._mode --> 205 file = self._opener(*self._args, **kwargs) 206 if self._mode == "w": 207 # ensure file doesn't get overriden when opened again ~/dev/dask-playground/env/lib/python3.9/site-packages/h5netcdf/core.py in __init__(self, path, mode, invalid_netcdf, phony_dims, **kwargs) 978 self._preexisting_file = mode in {"r", "r+", "a"} 979 self._h5py = h5py --> 980 self._h5file = self._h5py.File( 981 path, mode, track_order=track_order, **kwargs 982 ) ~/dev/dask-playground/env/lib/python3.9/site-packages/h5py/_hl/files.py in __init__(self, name, mode, driver, libver, userblock_size, swmr, rdcc_nslots, rdcc_nbytes, rdcc_w0, track_order, fs_strategy, fs_persist, fs_threshold, fs_page_size, page_buf_size, min_meta_keep, min_raw_keep, locking, **kwds) 484 name = repr(name).encode('ASCII', 'replace') 485 else: --> 486 name = filename_encode(name) 487 488 if track_order is None: ~/dev/dask-playground/env/lib/python3.9/site-packages/h5py/_hl/compat.py in filename_encode(filename) 17 filenames in h5py for more information. 18 """ ---> 19 filename = fspath(filename) 20 if sys.platform == "win32": 21 if isinstance(filename, str): ~/dev/dask-playground/env/lib/python3.9/site-packages/fsspec/core.py in __fspath__(self) 96 def __fspath__(self): 97 # may raise if cannot be resolved to local file ---> 98 return self.open().__fspath__() 99 100 def __enter__(self): AttributeError: 'S3File' object has no attribute '__fspath__' ``` The problem is that `OpenFile` doesn't have a `read` or `seek` method, so h5py doesn't think it's a proper file-like object and tries to `fspath` it here: https://github.com/h5py/h5py/blob/master/h5py/_hl/files.py#L509

So I may just be misunderstanding what an fsspec.OpenFile object is supposed to be (it's not actually a file-like object until you .open() it?). But I expect users would be similarly confused by this distinction.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  1031275532
Powered by Datasette · Queries took 0.801ms · About: xarray-datasette