home / github / issues

Menu
  • GraphQL API
  • Search all tables

issues: 1806386948

This data as json

id node_id number title user state locked assignee milestone comments created_at updated_at closed_at author_association active_lock_reason draft pull_request body reactions performed_via_github_app state_reason repo type
1806386948 I_kwDOAMm_X85rq0cE 7990 Random crashes in netcdf when dask client has multiple threads 40218891 closed 0     1 2023-07-16T01:00:55Z 2023-08-23T00:18:18Z 2023-08-23T00:18:17Z NONE      

What happened?

The data files can be found here: https://noaadata.apps.nsidc.org/NOAA/G02202_V4/north/monthly/. The example code below crashes randomly: the file processed when the crash occurs differs between runs. This happens only when threads_per_worker is > 1 in the client() call . n_workers does not matter, at least I could not make it to crash. The traceback points to hdf5.

What did you expect to happen?

No response

Minimal Complete Verifiable Example

```Python from pathlib import Path

import pandas as pd from dask.distributed import Client

import xarray as xr

client = Client(n_workers=1, threads_per_worker=4)

DATADIR = Path("/mnt/sdc1/icec/NSIDC") year = 2020

times = pd.date_range(f"{year}-01-01", f"{year}-12-01", freq="MS", name="time") paths = [ DATADIR / "monthly" / f"seaice_conc_monthly_nh_{t.strftime('%Y%m')}_f17_v04r00.nc" for t in times ] for n in range(10): ds = xr.open_mfdataset( paths, combine="nested", concat_dim="tdim", parallel=True, engine="netcdf4", ) del ds

HDF5-DIAG: Error detected in HDF5 (1.14.0) thread 0: #000: H5G.c line 442 in H5Gopen2(): unable to synchronously open group major: Symbol table minor: Unable to create file #001: H5G.c line 399 in H5G__open_api_common(): can't set object access arguments major: Symbol table minor: Can't set value #002: H5VLint.c line 2669 in H5VL_setup_acc_args(): invalid location identifier major: Invalid arguments to routine minor: Inappropriate type #003: H5VLint.c line 1787 in H5VL_vol_object(): invalid identifier type to function major: Invalid arguments to routine minor: Inappropriate type HDF5-DIAG: Error detected in HDF5 (1.14.0) thread 0: #000: H5G.c line 887 in H5Gclose(): not a group ID major: Invalid arguments to routine minor: Inappropriate type 2023-07-16 00:35:47,833 - distributed.worker - WARNING - Compute Failed Key: open_dataset-09a155bb-5079-406a-83c4-737933c409c7 Function: execute_task args: ((<function apply at 0x7f0001edf520>, <function open_dataset at 0x7effe3e35c60>, ['/mnt/sdc1/icec/NSIDC/monthly/seaice_conc_monthly_nh_202001_f17_v04r00.nc'], (<class 'dict'>, [['engine', 'netcdf4'], ['chunks', (<class 'dict'>, [])]]))) kwargs: {} Exception: "OSError(-101, 'NetCDF: HDF error')"

2023-07-16 00:35:47,834 - distributed.worker - WARNING - Compute Failed Key: open_dataset-14e239f4-7e16-4891-a350-b55979d4a754 Function: execute_task args: ((<function apply at 0x7f0001edf520>, <function open_dataset at 0x7effe3e35c60>, ['/mnt/sdc1/icec/NSIDC/monthly/seaice_conc_monthly_nh_202011_f17_v04r00.nc'], (<class 'dict'>, [['engine', 'netcdf4'], ['chunks', (<class 'dict'>, [])]]))) kwargs: {} Exception: "OSError(-101, 'NetCDF: HDF error')"


OSError Traceback (most recent call last) Cell In[1], line 19 14 paths = [ 15 DATADIR / "monthly" / f"seaice_conc_monthly_nh_{t.strftime('%Y%m')}_f17_v04r00.nc" 16 for t in times 17 ] 18 for n in range(10): ---> 19 ds = xr.open_mfdataset( 20 paths, 21 combine="nested", 22 concat_dim="tdim", 23 parallel=True, 24 engine="netcdf4", 25 ) 26 del ds

File ~/mambaforge/envs/icec/lib/python3.10/site-packages/xarray/backends/api.py:1050, in open_mfdataset(paths, chunks, concat_dim, compat, preprocess, engine, data_vars, coords, combine, parallel, join, attrs_file, combine_attrs, **kwargs) 1045 datasets = [preprocess(ds) for ds in datasets] 1047 if parallel: 1048 # calling compute here will return the datasets/file_objs lists, 1049 # the underlying datasets will still be stored as dask arrays -> 1050 datasets, closers = dask.compute(datasets, closers) 1052 # Combine all datasets, closing them in case of a ValueError 1053 try:

File ~/mambaforge/envs/icec/lib/python3.10/site-packages/xarray/backends/api.py:570, in open_dataset() 558 decoders = _resolve_decoders_kwargs( 559 decode_cf, 560 open_backend_dataset_parameters=backend.open_dataset_parameters, (...) 566 decode_coords=decode_coords, 567 ) 569 overwrite_encoded_chunks = kwargs.pop("overwrite_encoded_chunks", None) --> 570 backend_ds = backend.open_dataset( 571 filename_or_obj, 572 drop_variables=drop_variables, 573 decoders, 574 kwargs, 575 ) 576 ds = _dataset_from_backend_dataset( 577 backend_ds, 578 filename_or_obj, (...) 588 **kwargs, 589 ) 590 return ds

File ~/mambaforge/envs/icec/lib/python3.10/site-packages/xarray/backends/netCDF4_.py:590, in open_dataset() 569 def open_dataset( # type: ignore[override] # allow LSP violation, not supporting **kwargs 570 self, 571 filename_or_obj: str | os.PathLike[Any] | BufferedIOBase | AbstractDataStore, (...) 587 autoclose=False, 588 ) -> Dataset: 589 filename_or_obj = _normalize_path(filename_or_obj) --> 590 store = NetCDF4DataStore.open( 591 filename_or_obj, 592 mode=mode, 593 format=format, 594 group=group, 595 clobber=clobber, 596 diskless=diskless, 597 persist=persist, 598 lock=lock, 599 autoclose=autoclose, 600 ) 602 store_entrypoint = StoreBackendEntrypoint() 603 with close_on_error(store):

File ~/mambaforge/envs/icec/lib/python3.10/site-packages/xarray/backends/netCDF4_.py:391, in open() 385 kwargs = dict( 386 clobber=clobber, diskless=diskless, persist=persist, format=format 387 ) 388 manager = CachingFileManager( 389 netCDF4.Dataset, filename, mode=mode, kwargs=kwargs 390 ) --> 391 return cls(manager, group=group, mode=mode, lock=lock, autoclose=autoclose)

File ~/mambaforge/envs/icec/lib/python3.10/site-packages/xarray/backends/netCDF4_.py:338, in init() 336 self._group = group 337 self._mode = mode --> 338 self.format = self.ds.data_model 339 self._filename = self.ds.filepath() 340 self.is_remote = is_remote_uri(self._filename)

File ~/mambaforge/envs/icec/lib/python3.10/site-packages/xarray/backends/netCDF4_.py:400, in ds() 398 @property 399 def ds(self): --> 400 return self._acquire()

File ~/mambaforge/envs/icec/lib/python3.10/site-packages/xarray/backends/netCDF4_.py:394, in _acquire() 393 def _acquire(self, needs_lock=True): --> 394 with self._manager.acquire_context(needs_lock) as root: 395 ds = _nc4_require_group(root, self._group, self._mode) 396 return ds

File ~/mambaforge/envs/icec/lib/python3.10/contextlib.py:135, in enter() 133 del self.args, self.kwds, self.func 134 try: --> 135 return next(self.gen) 136 except StopIteration: 137 raise RuntimeError("generator didn't yield") from None

File ~/mambaforge/envs/icec/lib/python3.10/site-packages/xarray/backends/file_manager.py:199, in acquire_context() 196 @contextlib.contextmanager 197 def acquire_context(self, needs_lock=True): 198 """Context manager for acquiring a file.""" --> 199 file, cached = self._acquire_with_cache_info(needs_lock) 200 try: 201 yield file

File ~/mambaforge/envs/icec/lib/python3.10/site-packages/xarray/backends/file_manager.py:217, in _acquire_with_cache_info() 215 kwargs = kwargs.copy() 216 kwargs["mode"] = self._mode --> 217 file = self._opener(self._args, *kwargs) 218 if self._mode == "w": 219 # ensure file doesn't get overridden when opened again 220 self._mode = "a"

File src/netCDF4/_netCDF4.pyx:2464, in netCDF4._netCDF4.Dataset.init()

File src/netCDF4/_netCDF4.pyx:2027, in netCDF4._netCDF4._ensure_nc_success()

OSError: [Errno -101] NetCDF: HDF error: '/mnt/sdc1/icec/NSIDC/monthly/seaice_conc_monthly_nh_202011_f17_v04r00.nc' ```

MVCE confirmation

  • [X] Minimal example — the example is as focused as reasonably possible to demonstrate the underlying issue in xarray.
  • [X] Complete example — the example is self-contained, including all data and the text of any traceback.
  • [X] Verifiable example — the example copy & pastes into an IPython prompt or Binder notebook, returning the result.
  • [X] New issue — a search of GitHub Issues suggests this is not a duplicate.

Relevant log output

No response

Anything else we need to know?

No response

Environment

INSTALLED VERSIONS ------------------ commit: None python: 3.10.12 | packaged by conda-forge | (main, Jun 23 2023, 22:40:32) [GCC 12.3.0] python-bits: 64 OS: Linux OS-release: 6.1.38-1-MANJARO machine: x86_64 processor: byteorder: little LC_ALL: None LANG: en_US.UTF-8 LOCALE: ('en_US', 'UTF-8') libhdf5: 1.14.0 libnetcdf: 4.9.2 xarray: 2023.6.0 pandas: 2.0.3 numpy: 1.24.4 scipy: 1.11.1 netCDF4: 1.6.4 pydap: None h5netcdf: None h5py: 3.9.0 Nio: None zarr: 2.15.0 cftime: 1.6.2 nc_time_axis: None PseudoNetCDF: None iris: None bottleneck: None dask: 2023.7.0 distributed: 2023.7.0 matplotlib: 3.7.1 cartopy: 0.21.1 seaborn: None numbagg: None fsspec: 2023.6.0 cupy: None pint: None sparse: 0.14.0 flox: None numpy_groupies: None setuptools: 68.0.0 pip: 23.2 conda: None pytest: None mypy: None IPython: 8.14.0 sphinx: None
{
    "url": "https://api.github.com/repos/pydata/xarray/issues/7990/reactions",
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  completed 13221727 issue

Links from other tables

  • 2 rows from issues_id in issues_labels
  • 0 rows from issue in issue_comments
Powered by Datasette · Queries took 0.833ms · About: xarray-datasette