home / github

Menu
  • Search all tables
  • GraphQL API

issues

Table actions
  • GraphQL API for issues

1 row where state = "closed" and user = 38170479 sorted by updated_at descending

✎ View and edit SQL

This data as json, CSV (advanced)

Suggested facets: created_at (date), updated_at (date), closed_at (date)

type 1

  • issue 1

state 1

  • closed · 1 ✖

repo 1

  • xarray 1
id node_id number title user state locked assignee milestone comments created_at updated_at ▲ closed_at author_association active_lock_reason draft pull_request body reactions performed_via_github_app state_reason repo type
1310058435 I_kwDOAMm_X85OFefD 6813 Opening fsspec s3 file twice results in invalid start byte wroberts4 38170479 closed 0     9 2022-07-19T21:20:26Z 2022-12-01T16:18:24Z 2022-12-01T16:18:24Z CONTRIBUTOR      

What happened?

When I open an fsspec s3 file twice, it results in an error, "file-like object read/write pointer not at the start of the file".

Here's a Dockerfile I used for the environment: FROM condaforge/mambaforge:4.12.0-0 RUN mamba install -y --strict-channel-priority -c conda-forge python=3.10 dask h5netcdf xarray fsspec s3fs

Input1: import fsspec import xarray as xr fs = fsspec.filesystem('s3', anon=True) fp = 'noaa-goes16/ABI-L1b-RadF/2019/079/14/OR_ABI-L1b-RadF-M3C03_G16_s20190791400366_e20190791411133_c20190791411180.nc' data = fs.open(fp) xr.open_dataset(data, engine='h5netcdf', chunks={}) xr.open_dataset(data, engine='h5netcdf', chunks={}) Output1: Traceback (most recent call last): File "//example.py", line 26, in <module> xr.open_dataset(data, engine='h5netcdf', chunks={}) File "/opt/conda/lib/python3.10/site-packages/xarray/backends/api.py", line 531, in open_dataset backend_ds = backend.open_dataset( File "/opt/conda/lib/python3.10/site-packages/xarray/backends/h5netcdf_.py", line 389, in open_dataset store = H5NetCDFStore.open( File "/opt/conda/lib/python3.10/site-packages/xarray/backends/h5netcdf_.py", line 157, in open magic_number = read_magic_number_from_file(filename) File "/opt/conda/lib/python3.10/site-packages/xarray/core/utils.py", line 645, in read_magic_number_from_file raise ValueError( ValueError: cannot guess the engine, file-like object read/write pointer not at the start of the file, please close and reopen, or use a context manager

----- INVALID EXAMPLE 2 ----- Input2: import fsspec import xarray as xr fs = fsspec.filesystem('s3', anon=True) fp = 'noaa-goes16/ABI-L1b-RadF/2019/079/14/OR_ABI-L1b-RadF-M3C03_G16_s20190791400366_e20190791411133_c20190791411180.nc' data = fs.open(fp, mode='r') xr.open_dataset(data, engine='h5netcdf', chunks={}) xr.open_dataset(data, engine='h5netcdf', chunks={}) Output2: Traceback (most recent call last): File "//example.py", line 25, in <module> xr.open_dataset(data, engine='h5netcdf', chunks={}) File "/opt/conda/lib/python3.10/site-packages/xarray/backends/api.py", line 531, in open_dataset backend_ds = backend.open_dataset( File "/opt/conda/lib/python3.10/site-packages/xarray/backends/h5netcdf_.py", line 389, in open_dataset store = H5NetCDFStore.open( File "/opt/conda/lib/python3.10/site-packages/xarray/backends/h5netcdf_.py", line 157, in open magic_number = read_magic_number_from_file(filename) File "/opt/conda/lib/python3.10/site-packages/xarray/core/utils.py", line 650, in read_magic_number_from_file magic_number = filename_or_obj.read(count) File "/opt/conda/lib/python3.10/codecs.py", line 322, in decode (result, consumed) = self._buffer_decode(data, self.errors, final) UnicodeDecodeError: 'utf-8' codec can't decode byte 0x89 in position 0: invalid start byte ----- INVALID EXAMPLE 2 -----

What did you expect to happen?

I expect both calls to open_dataset to yield the same result and not error. The following runs without errors: import fsspec import xarray as xr fs = fsspec.filesystem('s3', anon=True) fp = 'noaa-goes16/ABI-L1b-RadF/2019/079/14/OR_ABI-L1b-RadF-M3C03_G16_s20190791400366_e20190791411133_c20190791411180.nc' data = fs.open(fp) xr.open_dataset(data, engine='h5netcdf', chunks={}) data = fs.open(fp) xr.open_dataset(data, engine='h5netcdf', chunks={})

Minimal Complete Verifiable Example

No response

MVCE confirmation

  • [X] Minimal example — the example is as focused as reasonably possible to demonstrate the underlying issue in xarray.
  • [X] Complete example — the example is self-contained, including all data and the text of any traceback.
  • [X] Verifiable example — the example copy & pastes into an IPython prompt or Binder notebook, returning the result.
  • [x] New issue — a search of GitHub Issues suggests this is not a duplicate.

Relevant log output

No response

Anything else we need to know?

I see the same error mentioned in other issues like https://github.com/pydata/xarray/issues/3991, but it was determined to be a problem with the input data.

Environment

INSTALLED VERSIONS ------------------ commit: None python: 3.10.5 | packaged by conda-forge | (main, Jun 14 2022, 07:04:59) [GCC 10.3.0] python-bits: 64 OS: Linux OS-release: 4.18.0-348.20.1.el8_5.x86_64 machine: x86_64 processor: x86_64 byteorder: little LC_ALL: C.UTF-8 LANG: C.UTF-8 LOCALE: ('en_US', 'UTF-8') libhdf5: 1.12.1 libnetcdf: None xarray: 2022.6.0rc0 pandas: 1.4.3 numpy: 1.23.1 scipy: None netCDF4: None pydap: None h5netcdf: 1.0.1 h5py: 3.7.0 Nio: None zarr: None cftime: None nc_time_axis: None PseudoNetCDF: None rasterio: None cfgrib: None iris: None bottleneck: None dask: 2022.7.0 distributed: 2022.7.0 matplotlib: None cartopy: None seaborn: None numbagg: None fsspec: 2022.5.0 cupy: None pint: None sparse: None flox: None numpy_groupies: None setuptools: 63.2.0 pip: 22.0.4 conda: 4.13.0 pytest: None IPython: None sphinx: None
{
    "url": "https://api.github.com/repos/pydata/xarray/issues/6813/reactions",
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  completed xarray 13221727 issue

Advanced export

JSON shape: default, array, newline-delimited, object

CSV options:

CREATE TABLE [issues] (
   [id] INTEGER PRIMARY KEY,
   [node_id] TEXT,
   [number] INTEGER,
   [title] TEXT,
   [user] INTEGER REFERENCES [users]([id]),
   [state] TEXT,
   [locked] INTEGER,
   [assignee] INTEGER REFERENCES [users]([id]),
   [milestone] INTEGER REFERENCES [milestones]([id]),
   [comments] INTEGER,
   [created_at] TEXT,
   [updated_at] TEXT,
   [closed_at] TEXT,
   [author_association] TEXT,
   [active_lock_reason] TEXT,
   [draft] INTEGER,
   [pull_request] TEXT,
   [body] TEXT,
   [reactions] TEXT,
   [performed_via_github_app] TEXT,
   [state_reason] TEXT,
   [repo] INTEGER REFERENCES [repos]([id]),
   [type] TEXT
);
CREATE INDEX [idx_issues_repo]
    ON [issues] ([repo]);
CREATE INDEX [idx_issues_milestone]
    ON [issues] ([milestone]);
CREATE INDEX [idx_issues_assignee]
    ON [issues] ([assignee]);
CREATE INDEX [idx_issues_user]
    ON [issues] ([user]);
Powered by Datasette · Queries took 802.146ms · About: xarray-datasette