github: issues: 1 row where type = "issue" and user = 11075246 sorted by updated

1 row where type = "issue" and user = 11075246 sorted by updated_at descending

Search:

descending

id	node_id	number	title	user	state	locked	assignee	milestone	comments	created_at	updated_at ▲	closed_at	author_association	active_lock_reason	draft	pull_request	body	reactions	performed_via_github_app	state_reason	repo	type
1402002645	I_kwDOAMm_X85TkNzV	7146	Segfault writing large netcdf files to s3fs	d1mach 11075246	closed	0			17	2022-10-08T16:56:31Z	2024-04-28T20:11:59Z	2024-04-28T20:11:59Z	NONE				What happened? It seems netcdf4 does not work well currently with `s3fs` the FUSE filesystem layer over S3 compatible storage with either the default `netcdf4` engine nor with the `h5netcdf`. Here is an example python import numpy as np import xarray as xr from datetime import datetime, timedelta NTIMES=48 start = datetime(2022,10,6,0,0) time_vals = [start + timedelta(minutes=20t) for t in range(NTIMES)] times = xr.DataArray(data = [t.strftime('%Y%m%d%H%M%S').encode() for t in time_vals], dims=['Time']) v1 = xr.DataArray(data=np.zeros((len(times), 201, 201)), dims=['Time', 'x', 'y']) ds = xr.Dataset(data_vars=dict(times=times, v1=v1)) ds.to_netcdf(path='/my_s3_fs/test_netcdf.nc', format='NETCDF4', mode='w') On my system this code crashes with NTIMES=48, but completes without an error with NTIMES=24. The output with `NTIMES=48` is ``` There are 1 HDF5 objects open! Report: open objects on 72057594037927936 Segmentation fault (core dumped) ``` I have tried the other engine that handles NETCDF4 in xarray with `engine='h5netcdf'` and also got a segfault. A quick workaround seems to be to use the local filesystem to write the NetCDF file and then move the complete file to S3. `python ds.to_netcdf(path='/tmp/test_netcdf.nc', format='NETCDF4', mode='w') shutil.move('/tmp/test_netcdf.nc', '/my_s3_fs/test_netcdf.nc')` There are several pieces of software involved here: the xarray package (0.16.1), netcdf4 (1.5.4), HDF5 (1.10.6), and s3fs (1.79). If this is not a bug in my code but in the underlying libraries, most likely it is not an xarray bug, but since it fails with both Netcdf4 engines, I decided to report it here. What did you expect to happen? With NTIMES=24 I am getting a file `/my_s3_fs/test_netcdf.nc` of about 7.8 MBytes. WIth NTIMES=36 I get an empty file. I would expect to have this code run without a segfault and produce a nonempty file. Minimal Complete Verifiable Example Python import numpy as np import xarray as xr from datetime import datetime, timedelta NTIMES=48 start = datetime(2022,10,6,0,0) time_vals = [start + timedelta(minutes=20t) for t in range(NTIMES)] times = xr.DataArray(data = [t.strftime('%Y%m%d%H%M%S').encode() for t in time_vals], dims=['Time']) v1 = xr.DataArray(data=np.zeros((len(times), 201, 201)), dims=['Time', 'x', 'y']) ds = xr.Dataset(data_vars=dict(times=times, v1=v1)) ds.to_netcdf(path='/my_s3_fs/test_netcdf.nc', format='NETCDF4', mode='w') MVCE confirmation [X] Minimal example — the example is as focused as reasonably possible to demonstrate the underlying issue in xarray. [X] Complete example — the example is self-contained, including all data and the text of any traceback. [X] Verifiable example — the example copy & pastes into an IPython prompt or Binder notebook, returning the result. [X] New issue — a search of GitHub Issues suggests this is not a duplicate. Relevant log output ```Python There are 1 HDF5 objects open! Report: open objects on 72057594037927936 Segmentation fault (core dumped) ``` Anything else we need to know? No response Environment INSTALLED VERSIONS ------------------ commit: None python: 3.8.3 \| packaged by conda-forge \| (default, Jun 1 2020, 17:43:00) [GCC 7.5.0] python-bits: 64 OS: Linux OS-release: 5.4.0-26-generic machine: x86_64 processor: x86_64 byteorder: little LC_ALL: None LANG: None LOCALE: en_US.UTF-8 libhdf5: 1.10.6 libnetcdf: 4.7.4 xarray: 0.16.1 pandas: 1.1.3 numpy: 1.19.1 scipy: 1.5.2 netCDF4: 1.5.4 pydap: None h5netcdf: 1.0.2 h5py: 3.1.0 Nio: None zarr: None cftime: 1.2.1 nc_time_axis: None PseudoNetCDF: None rasterio: None cfgrib: None iris: None bottleneck: None dask: 2.30.0 distributed: None matplotlib: 3.3.1 cartopy: None seaborn: None numbagg: None pint: None setuptools: 50.3.0.post20201006 pip: 20.2.3 conda: 22.9.0 pytest: 6.1.1 IPython: 7.18.1 sphinx: None	{ "url": "https://api.github.com/repos/pydata/xarray/issues/7146/reactions", "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }		completed	xarray 13221727	issue

Advanced export

JSON shape: default, array, newline-delimited, object

CREATE TABLE [issues] (
   [id] INTEGER PRIMARY KEY,
   [node_id] TEXT,
   [number] INTEGER,
   [title] TEXT,
   [user] INTEGER REFERENCES [users]([id]),
   [state] TEXT,
   [locked] INTEGER,
   [assignee] INTEGER REFERENCES [users]([id]),
   [milestone] INTEGER REFERENCES [milestones]([id]),
   [comments] INTEGER,
   [created_at] TEXT,
   [updated_at] TEXT,
   [closed_at] TEXT,
   [author_association] TEXT,
   [active_lock_reason] TEXT,
   [draft] INTEGER,
   [pull_request] TEXT,
   [body] TEXT,
   [reactions] TEXT,
   [performed_via_github_app] TEXT,
   [state_reason] TEXT,
   [repo] INTEGER REFERENCES [repos]([id]),
   [type] TEXT
);
CREATE INDEX [idx_issues_repo]
    ON [issues] ([repo]);
CREATE INDEX [idx_issues_milestone]
    ON [issues] ([milestone]);
CREATE INDEX [idx_issues_assignee]
    ON [issues] ([assignee]);
CREATE INDEX [idx_issues_user]
    ON [issues] ([user]);

issues

1 row where type = "issue" and user = 11075246 sorted by updated_at descending

What happened?

What did you expect to happen?

Minimal Complete Verifiable Example

MVCE confirmation

Relevant log output

Anything else we need to know?

Environment

Advanced export