id,node_id,number,title,user,state,locked,assignee,milestone,comments,created_at,updated_at,closed_at,author_association,active_lock_reason,draft,pull_request,body,reactions,performed_via_github_app,state_reason,repo,type 548263148,MDU6SXNzdWU1NDgyNjMxNDg=,3684,open_mfdataset - different behavior with dask.distributed.LocalCluster,10137,open,0,,,3,2020-01-10T19:58:19Z,2023-09-05T10:56:23Z,,NONE,,,,"Big fan of Xarray! Not that familiar with submitting tickets like this, so my apologies for rule breaking. Also, if this belongs over in the dask project, I can move there. dask 2.6.0 numpy 1.17.3 xarray 0.14.1 netCDF4 1.5.3 I am attempting to use open_mfdataset on nc files I've generated through dask/xarray after initializing the dask LocalCluster. I've found that I am able to compute successfully when I don't run the distributed cluster. But if I do, I get a variety of issues. I've got a synthetic data generating example here. Running the soundspeed.compute() will sometimes succeed, and will sometimes cause worker restarts resulting in hdf errors and no return. I was thinking it was something with serialization, i've seen other tickets with similar issues, but I don't see how it applies to my test case. Example code: ```python import numpy as np import xarray as xr import os from dask.distributed import Client cl = Client() outpth = r'D:\dasktest\data_dir\EM2040\converted\test' mint = 0 maxt = 1000 for i in range(100): times = np.arange(mint, maxt) beams = np.arange(250) sectors=['40107_0_260000', '40107_1_320000', '40107_2_290000'] soundspeed = np.random.randn(1000,3,250) ds = xr.Dataset({'soundspeed': (('time','sectors','beams'), soundspeed)}, {'time': times, 'sectors': sectors, 'beams':beams},) ds.to_netcdf(os.path.join(outpth, 'test{}.nc'.format(i)), mode='w') mint = maxt maxt += 1000 fils = [os.path.join(outpth, x) for x in os.listdir(outpth) if os.path.splitext(x)[1] == '.nc'] tst = xr.open_mfdataset(fils, concat_dim='time', combine='nested') tst.soundspeed.compute() ``` I've found that running this example with <10 files reduces the number of errors I'm getting dramatically. I've tried this on different machines in different domain environments just to be sure. I really just want to make sure I'm not making a silly mistake somewhere. Appreciate the help. My last run on actual data: ```python >>> ra.soundspeed.compute() distributed.nanny - WARNING - Restarting worker distributed.nanny - WARNING - Restarting worker distributed.nanny - WARNING - Restarting worker distributed.worker - WARNING - Compute Failed Function: getter args: (ImplicitToExplicitIndexingAdapter(array=CopyOnWriteArray(array=LazilyOuterIndexedArray(array=, key=BasicIndexer((slice(None, None, None), slice(None, None, None)))))), (slice(0, 1719, None), slice(0, 3, None))) kwargs: {} Exception: OSError(-101, 'NetCDF: HDF error') Traceback (most recent call last): File """", line 1, in File ""C:\PydroXL_19\envs\dasktest\lib\site-packages\xarray\core\dataarray.py"", line 837, in compute return new.load(**kwargs) File ""C:\PydroXL_19\envs\dasktest\lib\site-packages\xarray\core\dataarray.py"", line 811, in load ds = self._to_temp_dataset().load(**kwargs) File ""C:\PydroXL_19\envs\dasktest\lib\site-packages\xarray\core\dataset.py"", line 649, in load evaluated_data = da.compute(*lazy_data.values(), **kwargs) File ""C:\PydroXL_19\envs\dasktest\lib\site-packages\dask\base.py"", line 436, in compute results = schedule(dsk, keys, **kwargs) File ""C:\PydroXL_19\envs\dasktest\lib\site-packages\distributed\client.py"", line 2545, in get results = self.gather(packed, asynchronous=asynchronous, direct=direct) File ""C:\PydroXL_19\envs\dasktest\lib\site-packages\distributed\client.py"", line 1845, in gather asynchronous=asynchronous, File ""C:\PydroXL_19\envs\dasktest\lib\site-packages\distributed\client.py"", line 762, in sync self.loop, func, *args, callback_timeout=callback_timeout, **kwargs File ""C:\PydroXL_19\envs\dasktest\lib\site-packages\distributed\utils.py"", line 333, in sync raise exc.with_traceback(tb) File ""C:\PydroXL_19\envs\dasktest\lib\site-packages\distributed\utils.py"", line 317, in f result[0] = yield future File ""C:\PydroXL_19\envs\dasktest\lib\site-packages\tornado\gen.py"", line 735, in run value = future.result() File ""C:\PydroXL_19\envs\dasktest\lib\site-packages\distributed\client.py"", line 1701, in _gather raise exception.with_traceback(traceback) File ""C:\PydroXL_19\envs\dasktest\lib\site-packages\dask\array\core.py"", line 106, in getter c = np.asarray(c) File ""C:\PydroXL_19\envs\dasktest\lib\site-packages\numpy\core\_asarray.py"", line 85, in asarray return array(a, dtype, copy=False, order=order) File ""C:\PydroXL_19\envs\dasktest\lib\site-packages\xarray\core\indexing.py"", line 481, in __array__ return np.asarray(self.array, dtype=dtype) File ""C:\PydroXL_19\envs\dasktest\lib\site-packages\numpy\core\_asarray.py"", line 85, in asarray return array(a, dtype, copy=False, order=order) File ""C:\PydroXL_19\envs\dasktest\lib\site-packages\xarray\core\indexing.py"", line 643, in __array__ return np.asarray(self.array, dtype=dtype) File ""C:\PydroXL_19\envs\dasktest\lib\site-packages\numpy\core\_asarray.py"", line 85, in asarray return array(a, dtype, copy=False, order=order) File ""C:\PydroXL_19\envs\dasktest\lib\site-packages\xarray\core\indexing.py"", line 547, in __array__ return np.asarray(array[self.key], dtype=None) File ""C:\PydroXL_19\envs\dasktest\lib\site-packages\xarray\backends\netCDF4_.py"", line 72, in __getitem__ key, self.shape, indexing.IndexingSupport.OUTER, self._getitem File ""C:\PydroXL_19\envs\dasktest\lib\site-packages\xarray\core\indexing.py"", line 827, in explicit_indexing_adapter result = raw_indexing_method(raw_key.tuple) File ""C:\PydroXL_19\envs\dasktest\lib\site-packages\xarray\backends\netCDF4_.py"", line 83, in _getitem original_array = self.get_array(needs_lock=False) File ""C:\PydroXL_19\envs\dasktest\lib\site-packages\xarray\backends\netCDF4_.py"", line 62, in get_array ds = self.datastore._acquire(needs_lock) File ""C:\PydroXL_19\envs\dasktest\lib\site-packages\xarray\backends\netCDF4_.py"", line 360, in _acquire with self._manager.acquire_context(needs_lock) as root: File ""C:\PydroXL_19\envs\dasktest\lib\contextlib.py"", line 81, in __enter__ return next(self.gen) File ""C:\PydroXL_19\envs\dasktest\lib\site-packages\xarray\backends\file_manager.py"", line 186, in acquire_context file, cached = self._acquire_with_cache_info(needs_lock) File ""C:\PydroXL_19\envs\dasktest\lib\site-packages\xarray\backends\file_manager.py"", line 204, in _acquire_with_cache_info file = self._opener(*self._args, **kwargs) File ""netCDF4\_netCDF4.pyx"", line 2321, in netCDF4._netCDF4.Dataset.__init__ File ""netCDF4\_netCDF4.pyx"", line 1885, in netCDF4._netCDF4._ensure_nc_success OSError: [Errno -101] NetCDF: HDF error: b'D:\\dasktest\\data_dir\\EM2040\\converted\\rangeangle_20.nc' ``` My last run on the synthetic data set generated above: ```python >>> tst.soundspeed.compute() distributed.worker - WARNING - Compute Failed Function: getter args: (ImplicitToExplicitIndexingAdapter(array=CopyOnWriteArray(array=LazilyOuterIndexedArray(array=, key=BasicIndexer((slice(None, None, None), slice(None, None, None), slice(None, None, None)))))), (slice(0, 1000, None), slice(0, 3, None), slice(0, 250, None))) kwargs: {} Exception: OSError(-101, 'NetCDF: HDF error') distributed.worker - WARNING - Compute Failed Function: getter args: (ImplicitToExplicitIndexingAdapter(array=CopyOnWriteArray(array=LazilyOuterIndexedArray(array=, key=BasicIndexer((slice(None, None, None), slice(None, None, None), slice(None, None, None)))))), (slice(0, 1000, None), slice(0, 3, None), slice(0, 250, None))) kwargs: {} Exception: OSError(-101, 'NetCDF: HDF error') distributed.worker - WARNING - Compute Failed Function: getter args: (ImplicitToExplicitIndexingAdapter(array=CopyOnWriteArray(array=LazilyOuterIndexedArray(array=, key=BasicIndexer((slice(None, None, None), slice(None, None, None), slice(None, None, None)))))), (slice(0, 1000, None), slice(0, 3, None), slice(0, 250, None))) kwargs: {} Exception: OSError(-101, 'NetCDF: HDF error') distributed.worker - WARNING - Compute Failed Function: getter args: (ImplicitToExplicitIndexingAdapter(array=CopyOnWriteArray(array=LazilyOuterIndexedArray(array=, key=BasicIndexer((slice(None, None, None), slice(None, None, None), slice(None, None, None)))))), (slice(0, 1000, None), slice(0, 3, None), slice(0, 250, None))) kwargs: {} Exception: OSError(-101, 'NetCDF: HDF error') distributed.worker - WARNING - Compute Failed Function: getter args: (ImplicitToExplicitIndexingAdapter(array=CopyOnWriteArray(array=LazilyOuterIndexedArray(array=, key=BasicIndexer((slice(None, None, None), slice(None, None, None), slice(None, None, None)))))), (slice(0, 1000, None), slice(0, 3, None), slice(0, 250, None))) kwargs: {} Exception: OSError(-101, 'NetCDF: HDF error') distributed.worker - WARNING - Compute Failed Function: getter args: (ImplicitToExplicitIndexingAdapter(array=CopyOnWriteArray(array=LazilyOuterIndexedArray(array=, key=BasicIndexer((slice(None, None, None), slice(None, None, None), slice(None, None, None)))))), (slice(0, 1000, None), slice(0, 3, None), slice(0, 250, None))) kwargs: {} Exception: OSError(-101, 'NetCDF: HDF error') distributed.worker - WARNING - Compute Failed Function: getter args: (ImplicitToExplicitIndexingAdapter(array=CopyOnWriteArray(array=LazilyOuterIndexedArray(array=, key=BasicIndexer((slice(None, None, None), slice(None, None, None), slice(None, None, None)))))), (slice(0, 1000, None), slice(0, 3, None), slice(0, 250, None))) kwargs: {} Exception: OSError(-101, 'NetCDF: HDF error') distributed.worker - WARNING - Compute Failed Function: getter args: (ImplicitToExplicitIndexingAdapter(array=CopyOnWriteArray(array=LazilyOuterIndexedArray(array=, key=BasicIndexer((slice(None, None, None), slice(None, None, None), slice(None, None, None)))))), (slice(0, 1000, None), slice(0, 3, None), slice(0, 250, None))) kwargs: {} Exception: OSError(-101, 'NetCDF: HDF error') distributed.worker - WARNING - Compute Failed Function: getter args: (ImplicitToExplicitIndexingAdapter(array=CopyOnWriteArray(array=LazilyOuterIndexedArray(array=, key=BasicIndexer((slice(None, None, None), slice(None, None, None), slice(None, None, None)))))), (slice(0, 1000, None), slice(0, 3, None), slice(0, 250, None))) kwargs: {} Exception: OSError(-101, 'NetCDF: HDF error') distributed.worker - WARNING - Compute Failed Function: getter args: (ImplicitToExplicitIndexingAdapter(array=CopyOnWriteArray(array=LazilyOuterIndexedArray(array=, key=BasicIndexer((slice(None, None, None), slice(None, None, None), slice(None, None, None)))))), (slice(0, 1000, None), slice(0, 3, None), slice(0, 250, None))) kwargs: {} Exception: OSError(-101, 'NetCDF: HDF error') distributed.worker - WARNING - Compute Failed Function: getter args: (ImplicitToExplicitIndexingAdapter(array=CopyOnWriteArray(array=LazilyOuterIndexedArray(array=, key=BasicIndexer((slice(None, None, None), slice(None, None, None), slice(None, None, None)))))), (slice(0, 1000, None), slice(0, 3, None), slice(0, 250, None))) kwargs: {} Exception: OSError(-101, 'NetCDF: HDF error') distributed.worker - WARNING - Compute Failed Function: getter args: (ImplicitToExplicitIndexingAdapter(array=CopyOnWriteArray(array=LazilyOuterIndexedArray(array=, key=BasicIndexer((slice(None, None, None), slice(None, None, None), slice(None, None, None)))))), (slice(0, 1000, None), slice(0, 3, None), slice(0, 250, None))) kwargs: {} Exception: OSError(-101, 'NetCDF: HDF error') distributed.worker - WARNING - Compute Failed Function: getter args: (ImplicitToExplicitIndexingAdapter(array=CopyOnWriteArray(array=LazilyOuterIndexedArray(array=, key=BasicIndexer((slice(None, None, None), slice(None, None, None), slice(None, None, None)))))), (slice(0, 1000, None), slice(0, 3, None), slice(0, 250, None))) kwargs: {} Exception: OSError(-101, 'NetCDF: HDF error') distributed.worker - WARNING - Compute Failed Function: getter args: (ImplicitToExplicitIndexingAdapter(array=CopyOnWriteArray(array=LazilyOuterIndexedArray(array=, key=BasicIndexer((slice(None, None, None), slice(None, None, None), slice(None, None, None)))))), (slice(0, 1000, None), slice(0, 3, None), slice(0, 250, None))) kwargs: {} Exception: OSError(-101, 'NetCDF: HDF error') distributed.worker - WARNING - Compute Failed Function: getter args: (ImplicitToExplicitIndexingAdapter(array=CopyOnWriteArray(array=LazilyOuterIndexedArray(array=, key=BasicIndexer((slice(None, None, None), slice(None, None, None), slice(None, None, None)))))), (slice(0, 1000, None), slice(0, 3, None), slice(0, 250, None))) kwargs: {} Exception: OSError(-101, 'NetCDF: HDF error') distributed.worker - WARNING - Compute Failed Function: getter args: (ImplicitToExplicitIndexingAdapter(array=CopyOnWriteArray(array=LazilyOuterIndexedArray(array=, key=BasicIndexer((slice(None, None, None), slice(None, None, None), slice(None, None, None)))))), (slice(0, 1000, None), slice(0, 3, None), slice(0, 250, None))) kwargs: {} Exception: OSError(-101, 'NetCDF: HDF error') distributed.worker - WARNING - Compute Failed Function: getter args: (ImplicitToExplicitIndexingAdapter(array=CopyOnWriteArray(array=LazilyOuterIndexedArray(array=, key=BasicIndexer((slice(None, None, None), slice(None, None, None), slice(None, None, None)))))), (slice(0, 1000, None), slice(0, 3, None), slice(0, 250, None))) kwargs: {} Exception: OSError(-101, 'NetCDF: HDF error') distributed.worker - WARNING - Compute Failed Function: getter args: (ImplicitToExplicitIndexingAdapter(array=CopyOnWriteArray(array=LazilyOuterIndexedArray(array=, key=BasicIndexer((slice(None, None, None), slice(None, None, None), slice(None, None, None)))))), (slice(0, 1000, None), slice(0, 3, None), slice(0, 250, None))) kwargs: {} Exception: OSError(-101, 'NetCDF: HDF error') distributed.worker - WARNING - Compute Failed Function: getter args: (ImplicitToExplicitIndexingAdapter(array=CopyOnWriteArray(array=LazilyOuterIndexedArray(array=, key=BasicIndexer((slice(None, None, None), slice(None, None, None), slice(None, None, None)))))), (slice(0, 1000, None), slice(0, 3, None), slice(0, 250, None))) kwargs: {} Exception: OSError(-101, 'NetCDF: HDF error') Traceback (most recent call last): File """", line 1, in distributed.worker - WARNING - Compute Failed Function: getter args: (ImplicitToExplicitIndexingAdapter(array=CopyOnWriteArray(array=LazilyOuterIndexedArray(array=, key=BasicIndexer((slice(None, None, None), slice(None, None, None), slice(None, None, None)))))), (slice(0, 1000, None), slice(0, 3, None), slice(0, 250, None))) kwargs: {} Exception: OSError(-101, 'NetCDF: HDF error') File ""C:\PydroXL_19\envs\dasktest\lib\site-packages\xarray\core\dataarray.py"", line 837, in compute return new.load(**kwargs) File ""C:\PydroXL_19\envs\dasktest\lib\site-packages\xarray\core\dataarray.py"", line 811, in load ds = self._to_temp_dataset().load(**kwargs) File ""C:\PydroXL_19\envs\dasktest\lib\site-packages\xarray\core\dataset.py"", line 649, in load evaluated_data = da.compute(*lazy_data.values(), **kwargs) File ""C:\PydroXL_19\envs\dasktest\lib\site-packages\dask\base.py"", line 436, in compute results = schedule(dsk, keys, **kwargs) File ""C:\PydroXL_19\envs\dasktest\lib\site-packages\distributed\client.py"", line 2545, in get results = self.gather(packed, asynchronous=asynchronous, direct=direct) File ""C:\PydroXL_19\envs\dasktest\lib\site-packages\distributed\client.py"", line 1845, in gather asynchronous=asynchronous, File ""C:\PydroXL_19\envs\dasktest\lib\site-packages\distributed\client.py"", line 762, in sync self.loop, func, *args, callback_timeout=callback_timeout, **kwargs File ""C:\PydroXL_19\envs\dasktest\lib\site-packages\distributed\utils.py"", line 333, in sync raise exc.with_traceback(tb) File ""C:\PydroXL_19\envs\dasktest\lib\site-packages\distributed\utils.py"", line 317, in f result[0] = yield future File ""C:\PydroXL_19\envs\dasktest\lib\site-packages\tornado\gen.py"", line 735, in run value = future.result() File ""C:\PydroXL_19\envs\dasktest\lib\site-packages\distributed\client.py"", line 1701, in _gather raise exception.with_traceback(traceback) File ""C:\PydroXL_19\envs\dasktest\lib\site-packages\dask\array\core.py"", line 106, in getter c = np.asarray(c) File ""C:\PydroXL_19\envs\dasktest\lib\site-packages\numpy\core\_asarray.py"", line 85, in asarray return array(a, dtype, copy=False, order=order) File ""C:\PydroXL_19\envs\dasktest\lib\site-packages\xarray\core\indexing.py"", line 481, in __array__ return np.asarray(self.array, dtype=dtype) File ""C:\PydroXL_19\envs\dasktest\lib\site-packages\numpy\core\_asarray.py"", line 85, in asarray return array(a, dtype, copy=False, order=order) File ""C:\PydroXL_19\envs\dasktest\lib\site-packages\xarray\core\indexing.py"", line 643, in __array__ return np.asarray(self.array, dtype=dtype) File ""C:\PydroXL_19\envs\dasktest\lib\site-packages\numpy\core\_asarray.py"", line 85, in asarray return array(a, dtype, copy=False, order=order) File ""C:\PydroXL_19\envs\dasktest\lib\site-packages\xarray\core\indexing.py"", line 547, in __array__ return np.asarray(array[self.key], dtype=None) File ""C:\PydroXL_19\envs\dasktest\lib\site-packages\xarray\backends\netCDF4_.py"", line 72, in __getitem__ key, self.shape, indexing.IndexingSupport.OUTER, self._getitem File ""C:\PydroXL_19\envs\dasktest\lib\site-packages\xarray\core\indexing.py"", line 827, in explicit_indexing_adapter result = raw_indexing_method(raw_key.tuple) File ""C:\PydroXL_19\envs\dasktest\lib\site-packages\xarray\backends\netCDF4_.py"", line 83, in _getitem original_array = self.get_array(needs_lock=False) File ""C:\PydroXL_19\envs\dasktest\lib\site-packages\xarray\backends\netCDF4_.py"", line 62, in get_array ds = self.datastore._acquire(needs_lock) File ""C:\PydroXL_19\envs\dasktest\lib\site-packages\xarray\backends\netCDF4_.py"", line 360, in _acquire with self._manager.acquire_context(needs_lock) as root: File ""C:\PydroXL_19\envs\dasktest\lib\contextlib.py"", line 81, in __enter__ return next(self.gen) File ""C:\PydroXL_19\envs\dasktest\lib\site-packages\xarray\backends\file_manager.py"", line 186, in acquire_context file, cached = self._acquire_with_cache_info(needs_lock) File ""C:\PydroXL_19\envs\dasktest\lib\site-packages\xarray\backends\file_manager.py"", line 204, in _acquire_with_cache_info file = self._opener(*self._args, **kwargs) File ""netCDF4\_netCDF4.pyx"", line 2321, in netCDF4._netCDF4.Dataset.__init__ File ""netCDF4\_netCDF4.pyx"", line 1885, in netCDF4._netCDF4._ensure_nc_success OSError: [Errno -101] NetCDF: HDF error: b'D:\\dasktest\\data_dir\\EM2040\\converted\\test\\test4.nc' ```","{""url"": ""https://api.github.com/repos/pydata/xarray/issues/3684/reactions"", ""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,,13221727,issue