home / github / issues

Menu
  • GraphQL API
  • Search all tables

issues: 1933712083

This data as json

id node_id number title user state locked assignee milestone comments created_at updated_at closed_at author_association active_lock_reason draft pull_request body reactions performed_via_github_app state_reason repo type
1933712083 I_kwDOAMm_X85zQhrT 8289 segfault with a particular netcdf4 file 90008 open 0     11 2023-10-09T20:07:17Z 2024-05-03T16:54:18Z   CONTRIBUTOR      

What happened?

The following code yields a segfault on my machine (and many other machines with a similar environment)

``` import xarray filename = 'tiny.nc.txt' engine = "netcdf4"

dataset = xarray.open_dataset(filename, engine=engine)

i = 0 for i in range(60): xarray.open_dataset(filename, engine=engine) ```

tiny.nc.txt mrc.nc.txt

What did you expect to happen?

Not to segfault.

Minimal Complete Verifiable Example

  1. Generate some netcdf4 with my application.
  2. Trim the netcdf4 file down (load it, and drop all the vars I can while still reproducing this bug)
  3. Try to read it.

```Python import xarray from tqdm import tqdm filename = 'mrc.nc.txt' engine = "h5netcdf" dataset = xarray.open_dataset(filename, engine=engine)

for i in tqdm(range(60), desc=f"filename={filename}, enine={engine}"): xarray.open_dataset(filename, engine=engine)

engine = "netcdf4"

dataset = xarray.open_dataset(filename, engine=engine) for i in tqdm(range(60), desc=f"filename={filename}, enine={engine}"): xarray.open_dataset(filename, engine=engine)

filename = 'tiny.nc.txt'

engine = "h5netcdf" dataset = xarray.open_dataset(filename, engine=engine) for i in tqdm(range(60), desc=f"filename={filename}, enine={engine}"): xarray.open_dataset(filename, engine=engine)

engine = "netcdf4"

dataset = xarray.open_dataset(filename, engine=engine) for i in tqdm(range(60), desc=f"filename={filename}, enine={engine}"): xarray.open_dataset(filename, engine=engine) ```

hand crafting the file from start to finish seems to not segfault: ``` import xarray import numpy as np engine = 'netcdf4'

dataset = xarray.Dataset()

coords = {} coords['image_x'] = np.arange(1, dtype='int') dataset = dataset.assign_coords(coords)

dataset['image'] = xarray.DataArray( np.zeros((1,), dtype='uint8'), dims=('image_x',) )

%%

dataset.to_netcdf('mrc.nc.txt')

%%

dataset = xarray.open_dataset('mrc.nc.txt', engine=engine)

for i in range(10): xarray.open_dataset('mrc.nc.txt', engine=engine)

```

MVCE confirmation

  • [X] Minimal example — the example is as focused as reasonably possible to demonstrate the underlying issue in xarray.
  • [X] Complete example — the example is self-contained, including all data and the text of any traceback.
  • [X] Verifiable example — the example copy & pastes into an IPython prompt or Binder notebook, returning the result.
  • [X] New issue — a search of GitHub Issues suggests this is not a duplicate.
  • [X] Recent environment — the issue occurs with the latest version of xarray and its dependencies.

Relevant log output

Python i=0 passes i=1 mostly segfaults, but sometimes it can take more than 1 iteration

Anything else we need to know?

At first I thought it was deep in hdf5, but I am less convinced now

xref: https://github.com/HDFGroup/hdf5/issues/3649

Environment

``` INSTALLED VERSIONS ------------------ commit: None python: 3.10.12 | packaged by Ramona Optics | (main, Jun 27 2023, 02:59:09) [GCC 12.3.0] python-bits: 64 OS: Linux OS-release: 6.5.1-060501-generic machine: x86_64 processor: x86_64 byteorder: little LC_ALL: None LANG: en_US.UTF-8 LOCALE: ('en_US', 'UTF-8') libhdf5: 1.14.2 libnetcdf: 4.9.2 xarray: 2023.9.1.dev25+g46643bb1.d20231009 pandas: 2.1.1 numpy: 1.24.4 scipy: 1.11.3 netCDF4: 1.6.4 pydap: None h5netcdf: 1.2.0 h5py: 3.9.0 Nio: None zarr: 2.16.1 cftime: 1.6.2 nc_time_axis: None PseudoNetCDF: None iris: None bottleneck: None dask: 2023.3.0 distributed: 2023.3.0 matplotlib: 3.8.0 cartopy: None seaborn: None numbagg: None fsspec: 2023.9.2 cupy: None pint: 0.22 sparse: None flox: None numpy_groupies: None setuptools: 68.2.2 pip: 23.2.1 conda: 23.7.4 pytest: 7.4.2 mypy: None IPython: 8.16.1 sphinx: 7.2.6 ```
{
    "url": "https://api.github.com/repos/pydata/xarray/issues/8289/reactions",
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
    13221727 issue

Links from other tables

  • 2 rows from issues_id in issues_labels
  • 0 rows from issue in issue_comments
Powered by Datasette · Queries took 0.87ms · About: xarray-datasette