id,node_id,number,title,user,state,locked,assignee,milestone,comments,created_at,updated_at,closed_at,author_association,active_lock_reason,draft,pull_request,body,reactions,performed_via_github_app,state_reason,repo,type 1310058435,I_kwDOAMm_X85OFefD,6813,Opening fsspec s3 file twice results in invalid start byte,38170479,closed,0,,,9,2022-07-19T21:20:26Z,2022-12-01T16:18:24Z,2022-12-01T16:18:24Z,CONTRIBUTOR,,,,"### What happened? When I open an fsspec s3 file twice, it results in an error, ""file-like object read/write pointer not at the start of the file"". Here's a Dockerfile I used for the environment: ``` FROM condaforge/mambaforge:4.12.0-0 RUN mamba install -y --strict-channel-priority -c conda-forge python=3.10 dask h5netcdf xarray fsspec s3fs ``` Input1: ``` import fsspec import xarray as xr fs = fsspec.filesystem('s3', anon=True) fp = 'noaa-goes16/ABI-L1b-RadF/2019/079/14/OR_ABI-L1b-RadF-M3C03_G16_s20190791400366_e20190791411133_c20190791411180.nc' data = fs.open(fp) xr.open_dataset(data, engine='h5netcdf', chunks={}) xr.open_dataset(data, engine='h5netcdf', chunks={}) ``` Output1: ``` Traceback (most recent call last): File ""//example.py"", line 26, in xr.open_dataset(data, engine='h5netcdf', chunks={}) File ""/opt/conda/lib/python3.10/site-packages/xarray/backends/api.py"", line 531, in open_dataset backend_ds = backend.open_dataset( File ""/opt/conda/lib/python3.10/site-packages/xarray/backends/h5netcdf_.py"", line 389, in open_dataset store = H5NetCDFStore.open( File ""/opt/conda/lib/python3.10/site-packages/xarray/backends/h5netcdf_.py"", line 157, in open magic_number = read_magic_number_from_file(filename) File ""/opt/conda/lib/python3.10/site-packages/xarray/core/utils.py"", line 645, in read_magic_number_from_file raise ValueError( ValueError: cannot guess the engine, file-like object read/write pointer not at the start of the file, please close and reopen, or use a context manager ``` ----- INVALID EXAMPLE 2 ----- Input2: ``` import fsspec import xarray as xr fs = fsspec.filesystem('s3', anon=True) fp = 'noaa-goes16/ABI-L1b-RadF/2019/079/14/OR_ABI-L1b-RadF-M3C03_G16_s20190791400366_e20190791411133_c20190791411180.nc' data = fs.open(fp, mode='r') xr.open_dataset(data, engine='h5netcdf', chunks={}) xr.open_dataset(data, engine='h5netcdf', chunks={}) ``` Output2: ``` Traceback (most recent call last): File ""//example.py"", line 25, in xr.open_dataset(data, engine='h5netcdf', chunks={}) File ""/opt/conda/lib/python3.10/site-packages/xarray/backends/api.py"", line 531, in open_dataset backend_ds = backend.open_dataset( File ""/opt/conda/lib/python3.10/site-packages/xarray/backends/h5netcdf_.py"", line 389, in open_dataset store = H5NetCDFStore.open( File ""/opt/conda/lib/python3.10/site-packages/xarray/backends/h5netcdf_.py"", line 157, in open magic_number = read_magic_number_from_file(filename) File ""/opt/conda/lib/python3.10/site-packages/xarray/core/utils.py"", line 650, in read_magic_number_from_file magic_number = filename_or_obj.read(count) File ""/opt/conda/lib/python3.10/codecs.py"", line 322, in decode (result, consumed) = self._buffer_decode(data, self.errors, final) UnicodeDecodeError: 'utf-8' codec can't decode byte 0x89 in position 0: invalid start byte ``` ----- INVALID EXAMPLE 2 ----- ### What did you expect to happen? I expect both calls to open_dataset to yield the same result and not error. The following runs without errors: ``` import fsspec import xarray as xr fs = fsspec.filesystem('s3', anon=True) fp = 'noaa-goes16/ABI-L1b-RadF/2019/079/14/OR_ABI-L1b-RadF-M3C03_G16_s20190791400366_e20190791411133_c20190791411180.nc' data = fs.open(fp) xr.open_dataset(data, engine='h5netcdf', chunks={}) data = fs.open(fp) xr.open_dataset(data, engine='h5netcdf', chunks={}) ``` ### Minimal Complete Verifiable Example _No response_ ### MVCE confirmation - [X] Minimal example — the example is as focused as reasonably possible to demonstrate the underlying issue in xarray. - [X] Complete example — the example is self-contained, including all data and the text of any traceback. - [X] Verifiable example — the example copy & pastes into an IPython prompt or [Binder notebook](https://mybinder.org/v2/gh/pydata/xarray/main?urlpath=lab/tree/doc/examples/blank_template.ipynb), returning the result. - [x] New issue — a search of GitHub Issues suggests this is not a duplicate. ### Relevant log output _No response_ ### Anything else we need to know? I see the same error mentioned in other issues like https://github.com/pydata/xarray/issues/3991, but it was determined to be a problem with the input data. ### Environment
INSTALLED VERSIONS ------------------ commit: None python: 3.10.5 | packaged by conda-forge | (main, Jun 14 2022, 07:04:59) [GCC 10.3.0] python-bits: 64 OS: Linux OS-release: 4.18.0-348.20.1.el8_5.x86_64 machine: x86_64 processor: x86_64 byteorder: little LC_ALL: C.UTF-8 LANG: C.UTF-8 LOCALE: ('en_US', 'UTF-8') libhdf5: 1.12.1 libnetcdf: None xarray: 2022.6.0rc0 pandas: 1.4.3 numpy: 1.23.1 scipy: None netCDF4: None pydap: None h5netcdf: 1.0.1 h5py: 3.7.0 Nio: None zarr: None cftime: None nc_time_axis: None PseudoNetCDF: None rasterio: None cfgrib: None iris: None bottleneck: None dask: 2022.7.0 distributed: 2022.7.0 matplotlib: None cartopy: None seaborn: None numbagg: None fsspec: 2022.5.0 cupy: None pint: None sparse: None flox: None numpy_groupies: None setuptools: 63.2.0 pip: 22.0.4 conda: 4.13.0 pytest: None IPython: None sphinx: None
","{""url"": ""https://api.github.com/repos/pydata/xarray/issues/6813/reactions"", ""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,completed,13221727,issue