issues: 442617907
This data as json
id | node_id | number | title | user | state | locked | assignee | milestone | comments | created_at | updated_at | closed_at | author_association | active_lock_reason | draft | pull_request | body | reactions | performed_via_github_app | state_reason | repo | type |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
442617907 | MDU6SXNzdWU0NDI2MTc5MDc= | 2954 | Segmentation fault reading many groups from many files | 500246 | closed | 0 | 14 | 2019-05-10T09:12:34Z | 2019-07-12T16:48:26Z | 2019-07-12T16:48:26Z | CONTRIBUTOR | This is probably the wrong place to report it, but I haven't been able to reproduce this without using xarray. Repeatedly opening NetCDF4/HDF5 files and reading a group from them, triggers a Segmentation Fault after about 130–150 openings. See details below. Code Sample, a copy-pastable example if possible```python from itertools import count, product import netCDF4 import glob import xarray files = sorted(glob.glob("/media/nas/x21308/2019_05_Testdata/MTG/FCI/FDHSI/uncompressed/20170410_RC70/BODY.nc")) get all groupsdef get_groups(ds, pre=""): for g in ds.groups.keys(): nm = pre + "/" + g yield from get_groups(ds[g], nm) yield nm with netCDF4.Dataset(files[0]) as ds: groups = sorted(list(get_groups(ds))) print("total groups", len(groups), "total files", len(files)) ds_all = [] ng = 20 nf = 20 print("using groups", ng, "using files", nf) for (i, (g, f)) in zip(count(), product(groups[:ng], files[:nf])): print("attempting", i, "group", g, "from", f) ds = xarray.open_dataset( f, group=g, decode_cf=False) ds_all.append(ds) ``` Problem descriptionI have 70 NetCDF-4 files with 70 groups each. When I cycle through the files and read one group from them at the time, after about 130–150 times, the next opening fails with a Segmentation Fault. If I try to read one group from one file at the time, that would require a total of 70*70=4900 openings. If I limit to 20 groups from 20 files in total, it would require 400 openings. In either case, it fails after about 130–150 times. I'm using the Python xarray interface, but the error occurs in the HDF5 library. The message belows includes the traceback in Python: ```
HDF5-DIAG: Error detected in HDF5 (1.10.4) thread 140107218855616: [9/1985]
#000: H5D.c line 485 in H5Dget_create_plist(): Can't get creation plist
major: Dataset
minor: Can't get value
#001: H5Dint.c line 3159 in H5D__get_create_plist(): can't get dataset's creation property list
major: Dataset
minor: Can't get value
#002: H5Dint.c line 3296 in H5D_get_create_plist(): datatype conversion failed
major: Dataset During handling of the above exception, another exception occurred: Traceback (most recent call last): File "/tmp/mwe9.py", line 24, in <module> f, group=g, decode_cf=False) File "/media/nas/x21324/miniconda3/envs/py37d/lib/python3.7/site-packages/xarray/backends/api.py", line 363, in open_dataset filename_or_obj, group=group, lock=lock, backend_kwargs) File "/media/nas/x21324/miniconda3/envs/py37d/lib/python3.7/site-packages/xarray/backends/netCDF4_.py", line 352, in open return cls(manager, lock=lock, autoclose=autoclose) File "/media/nas/x21324/miniconda3/envs/py37d/lib/python3.7/site-packages/xarray/backends/netCDF4_.py", line 311, in init self.format = self.ds.data_model File "/media/nas/x21324/miniconda3/envs/py37d/lib/python3.7/site-packages/xarray/backends/netCDF4_.py", line 356, in ds return self._manager.acquire().value File "/media/nas/x21324/miniconda3/envs/py37d/lib/python3.7/site-packages/xarray/backends/file_manager.py", line 173, in acquire file = self._opener(*self._args, kwargs) File "/media/nas/x21324/miniconda3/envs/py37d/lib/python3.7/site-packages/xarray/backends/netCDF4_.py", line 244, in _open_netcdf4_group ds = nc4.Dataset(filename, mode=mode, **kwargs) File "netCDF4/_netCDF4.pyx", line 2291, in netCDF4._netCDF4.Dataset.init File "netCDF4/_netCDF4.pyx", line 1855, in netCDF4._netCDF4._ensure_nc_success OSError: [Errno -101] NetCDF: HDF error: b'/media/nas/x21308/2019_05_Testdata/MTG/FCI/FDHSI/uncompressed/20170410_RC70/W_XX-EUMETSAT-Darmstadt,IMG+SAT,MTI1+FCI-1C-RRAD-FDHSI-FD--CHK-BODY--L2P-NC4E_C_EUMT_20170410114417_GTT_DEV_20170410113908_20170410113917_N__C_0070_0065.nc' ``` More usually however, it fails with a Segmentation Fault and no further information. The failure might happen in any file. The full output of my script might end with:
prior to the segmentation fault. When running with ``` Fatal Python error: Segmentation fault Current thread 0x00007ff6ab89d6c0 (most recent call first): File "/media/nas/x21324/miniconda3/envs/py37d/lib/python3.7/site-packages/xarray/backends/netCDF4_.py", line 244 in open_netcdf4_group File "/media/nas/x21324/miniconda3/envs/py37d/lib/python3.7/site-packages/xarray/backends/file_manager.py", line 173 in acquire File "/media/nas/x21324/miniconda3/envs/py37d/lib/python3.7/site-packages/xarray/backends/netCDF4.py", line 356 in ds File "/media/nas/x21324/miniconda3/envs/py37d/lib/python3.7/site-packages/xarray/backends/netCDF4_.py", line 311 in init File "/media/nas/x21324/miniconda3/envs/py37d/lib/python3.7/site-packages/xarray/backends/netCDF4_.py", line 352 in open File "/media/nas/x21324/miniconda3/envs/py37d/lib/python3.7/site-packages/xarray/backends/api.py", line 363 in open_dataset File "/tmp/mwe9.py", line 24 in <module> Segmentation fault (core dumped) ``` Expected OutputI expect no segmentation fault. Output of
|
{ "url": "https://api.github.com/repos/pydata/xarray/issues/2954/reactions", "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
completed | 13221727 | issue |