home / github

Menu
  • GraphQL API
  • Search all tables

issues

Table actions
  • GraphQL API for issues

1 row where type = "issue" and user = 11075246 sorted by updated_at descending

✎ View and edit SQL

This data as json, CSV (advanced)

Suggested facets: created_at (date), updated_at (date), closed_at (date)

type 1

  • issue · 1 ✖

state 1

  • closed 1

repo 1

  • xarray 1
id node_id number title user state locked assignee milestone comments created_at updated_at ▲ closed_at author_association active_lock_reason draft pull_request body reactions performed_via_github_app state_reason repo type
1402002645 I_kwDOAMm_X85TkNzV 7146 Segfault writing large netcdf files to s3fs d1mach 11075246 closed 0     17 2022-10-08T16:56:31Z 2024-04-28T20:11:59Z 2024-04-28T20:11:59Z NONE      

What happened?

It seems netcdf4 does not work well currently with s3fs the FUSE filesystem layer over S3 compatible storage with either the default netcdf4 engine nor with the h5netcdf.

Here is an example python import numpy as np import xarray as xr from datetime import datetime, timedelta NTIMES=48 start = datetime(2022,10,6,0,0) time_vals = [start + timedelta(minutes=20*t) for t in range(NTIMES)] times = xr.DataArray(data = [t.strftime('%Y%m%d%H%M%S').encode() for t in time_vals], dims=['Time']) v1 = xr.DataArray(data=np.zeros((len(times), 201, 201)), dims=['Time', 'x', 'y']) ds = xr.Dataset(data_vars=dict(times=times, v1=v1)) ds.to_netcdf(path='/my_s3_fs/test_netcdf.nc', format='NETCDF4', mode='w') On my system this code crashes with NTIMES=48, but completes without an error with NTIMES=24.

The output with NTIMES=48 is

``` There are 1 HDF5 objects open!

Report: open objects on 72057594037927936 Segmentation fault (core dumped) ```

I have tried the other engine that handles NETCDF4 in xarray with engine='h5netcdf' and also got a segfault.

A quick workaround seems to be to use the local filesystem to write the NetCDF file and then move the complete file to S3.

python ds.to_netcdf(path='/tmp/test_netcdf.nc', format='NETCDF4', mode='w') shutil.move('/tmp/test_netcdf.nc', '/my_s3_fs/test_netcdf.nc') There are several pieces of software involved here: the xarray package (0.16.1), netcdf4 (1.5.4), HDF5 (1.10.6), and s3fs (1.79). If this is not a bug in my code but in the underlying libraries, most likely it is not an xarray bug, but since it fails with both Netcdf4 engines, I decided to report it here.

What did you expect to happen?

With NTIMES=24 I am getting a file /my_s3_fs/test_netcdf.nc of about 7.8 MBytes. WIth NTIMES=36 I get an empty file. I would expect to have this code run without a segfault and produce a nonempty file.

Minimal Complete Verifiable Example

Python import numpy as np import xarray as xr from datetime import datetime, timedelta NTIMES=48 start = datetime(2022,10,6,0,0) time_vals = [start + timedelta(minutes=20*t) for t in range(NTIMES)] times = xr.DataArray(data = [t.strftime('%Y%m%d%H%M%S').encode() for t in time_vals], dims=['Time']) v1 = xr.DataArray(data=np.zeros((len(times), 201, 201)), dims=['Time', 'x', 'y']) ds = xr.Dataset(data_vars=dict(times=times, v1=v1)) ds.to_netcdf(path='/my_s3_fs/test_netcdf.nc', format='NETCDF4', mode='w')

MVCE confirmation

  • [X] Minimal example — the example is as focused as reasonably possible to demonstrate the underlying issue in xarray.
  • [X] Complete example — the example is self-contained, including all data and the text of any traceback.
  • [X] Verifiable example — the example copy & pastes into an IPython prompt or Binder notebook, returning the result.
  • [X] New issue — a search of GitHub Issues suggests this is not a duplicate.

Relevant log output

```Python There are 1 HDF5 objects open!

Report: open objects on 72057594037927936 Segmentation fault (core dumped) ```

Anything else we need to know?

No response

Environment

INSTALLED VERSIONS ------------------ commit: None python: 3.8.3 | packaged by conda-forge | (default, Jun 1 2020, 17:43:00) [GCC 7.5.0] python-bits: 64 OS: Linux OS-release: 5.4.0-26-generic machine: x86_64 processor: x86_64 byteorder: little LC_ALL: None LANG: None LOCALE: en_US.UTF-8 libhdf5: 1.10.6 libnetcdf: 4.7.4 xarray: 0.16.1 pandas: 1.1.3 numpy: 1.19.1 scipy: 1.5.2 netCDF4: 1.5.4 pydap: None h5netcdf: 1.0.2 h5py: 3.1.0 Nio: None zarr: None cftime: 1.2.1 nc_time_axis: None PseudoNetCDF: None rasterio: None cfgrib: None iris: None bottleneck: None dask: 2.30.0 distributed: None matplotlib: 3.3.1 cartopy: None seaborn: None numbagg: None pint: None setuptools: 50.3.0.post20201006 pip: 20.2.3 conda: 22.9.0 pytest: 6.1.1 IPython: 7.18.1 sphinx: None
{
    "url": "https://api.github.com/repos/pydata/xarray/issues/7146/reactions",
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  completed xarray 13221727 issue

Advanced export

JSON shape: default, array, newline-delimited, object

CSV options:

CREATE TABLE [issues] (
   [id] INTEGER PRIMARY KEY,
   [node_id] TEXT,
   [number] INTEGER,
   [title] TEXT,
   [user] INTEGER REFERENCES [users]([id]),
   [state] TEXT,
   [locked] INTEGER,
   [assignee] INTEGER REFERENCES [users]([id]),
   [milestone] INTEGER REFERENCES [milestones]([id]),
   [comments] INTEGER,
   [created_at] TEXT,
   [updated_at] TEXT,
   [closed_at] TEXT,
   [author_association] TEXT,
   [active_lock_reason] TEXT,
   [draft] INTEGER,
   [pull_request] TEXT,
   [body] TEXT,
   [reactions] TEXT,
   [performed_via_github_app] TEXT,
   [state_reason] TEXT,
   [repo] INTEGER REFERENCES [repos]([id]),
   [type] TEXT
);
CREATE INDEX [idx_issues_repo]
    ON [issues] ([repo]);
CREATE INDEX [idx_issues_milestone]
    ON [issues] ([milestone]);
CREATE INDEX [idx_issues_assignee]
    ON [issues] ([assignee]);
CREATE INDEX [idx_issues_user]
    ON [issues] ([user]);
Powered by Datasette · Queries took 22.51ms · About: xarray-datasette