issues
1 row where state = "open" and user = 10137 sorted by updated_at descending
This data as json, CSV (advanced)
Suggested facets: created_at (date), updated_at (date)
| id | node_id | number | title | user | state | locked | assignee | milestone | comments | created_at | updated_at ▲ | closed_at | author_association | active_lock_reason | draft | pull_request | body | reactions | performed_via_github_app | state_reason | repo | type |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 548263148 | MDU6SXNzdWU1NDgyNjMxNDg= | 3684 | open_mfdataset - different behavior with dask.distributed.LocalCluster | ghost 10137 | open | 0 | 3 | 2020-01-10T19:58:19Z | 2023-09-05T10:56:23Z | NONE | Big fan of Xarray! Not that familiar with submitting tickets like this, so my apologies for rule breaking. Also, if this belongs over in the dask project, I can move there. dask 2.6.0 numpy 1.17.3 xarray 0.14.1 netCDF4 1.5.3 I am attempting to use open_mfdataset on nc files I've generated through dask/xarray after initializing the dask LocalCluster. I've found that I am able to compute successfully when I don't run the distributed cluster. But if I do, I get a variety of issues. I've got a synthetic data generating example here. Running the soundspeed.compute() will sometimes succeed, and will sometimes cause worker restarts resulting in hdf errors and no return. I was thinking it was something with serialization, i've seen other tickets with similar issues, but I don't see how it applies to my test case. Example code: ```python import numpy as np import xarray as xr import os from dask.distributed import Client cl = Client() outpth = r'D:\dasktest\data_dir\EM2040\converted\test' mint = 0 maxt = 1000 for i in range(100): times = np.arange(mint, maxt) beams = np.arange(250) sectors=['40107_0_260000', '40107_1_320000', '40107_2_290000'] soundspeed = np.random.randn(1000,3,250) ds = xr.Dataset({'soundspeed': (('time','sectors','beams'), soundspeed)}, {'time': times, 'sectors': sectors, 'beams':beams},) ds.to_netcdf(os.path.join(outpth, 'test{}.nc'.format(i)), mode='w') mint = maxt maxt += 1000 fils = [os.path.join(outpth, x) for x in os.listdir(outpth) if os.path.splitext(x)[1] == '.nc'] tst = xr.open_mfdataset(fils, concat_dim='time', combine='nested') tst.soundspeed.compute() ``` I've found that running this example with <10 files reduces the number of errors I'm getting dramatically. I've tried this on different machines in different domain environments just to be sure. I really just want to make sure I'm not making a silly mistake somewhere. Appreciate the help. My last run on actual data: ```python
Traceback (most recent call last): File "<stdin>", line 1, in <module> File "C:\PydroXL_19\envs\dasktest\lib\site-packages\xarray\core\dataarray.py", line 837, in compute return new.load(kwargs) File "C:\PydroXL_19\envs\dasktest\lib\site-packages\xarray\core\dataarray.py", line 811, in load ds = self._to_temp_dataset().load(kwargs) File "C:\PydroXL_19\envs\dasktest\lib\site-packages\xarray\core\dataset.py", line 649, in load evaluated_data = da.compute(lazy_data.values(), kwargs) File "C:\PydroXL_19\envs\dasktest\lib\site-packages\dask\base.py", line 436, in compute results = schedule(dsk, keys, kwargs) File "C:\PydroXL_19\envs\dasktest\lib\site-packages\distributed\client.py", line 2545, in get results = self.gather(packed, asynchronous=asynchronous, direct=direct) File "C:\PydroXL_19\envs\dasktest\lib\site-packages\distributed\client.py", line 1845, in gather asynchronous=asynchronous, File "C:\PydroXL_19\envs\dasktest\lib\site-packages\distributed\client.py", line 762, in sync self.loop, func, args, callback_timeout=callback_timeout, kwargs File "C:\PydroXL_19\envs\dasktest\lib\site-packages\distributed\utils.py", line 333, in sync raise exc.with_traceback(tb) File "C:\PydroXL_19\envs\dasktest\lib\site-packages\distributed\utils.py", line 317, in f result[0] = yield future File "C:\PydroXL_19\envs\dasktest\lib\site-packages\tornado\gen.py", line 735, in run value = future.result() File "C:\PydroXL_19\envs\dasktest\lib\site-packages\distributed\client.py", line 1701, in gather raise exception.with_traceback(traceback) File "C:\PydroXL_19\envs\dasktest\lib\site-packages\dask\array\core.py", line 106, in getter c = np.asarray(c) File "C:\PydroXL_19\envs\dasktest\lib\site-packages\numpy\core_asarray.py", line 85, in asarray return array(a, dtype, copy=False, order=order) File "C:\PydroXL_19\envs\dasktest\lib\site-packages\xarray\core\indexing.py", line 481, in array return np.asarray(self.array, dtype=dtype) File "C:\PydroXL_19\envs\dasktest\lib\site-packages\numpy\core_asarray.py", line 85, in asarray return array(a, dtype, copy=False, order=order) File "C:\PydroXL_19\envs\dasktest\lib\site-packages\xarray\core\indexing.py", line 643, in array return np.asarray(self.array, dtype=dtype) File "C:\PydroXL_19\envs\dasktest\lib\site-packages\numpy\core_asarray.py", line 85, in asarray return array(a, dtype, copy=False, order=order) File "C:\PydroXL_19\envs\dasktest\lib\site-packages\xarray\core\indexing.py", line 547, in array return np.asarray(array[self.key], dtype=None) File "C:\PydroXL_19\envs\dasktest\lib\site-packages\xarray\backends\netCDF4.py", line 72, in getitem key, self.shape, indexing.IndexingSupport.OUTER, self.getitem File "C:\PydroXL_19\envs\dasktest\lib\site-packages\xarray\core\indexing.py", line 827, in explicit_indexing_adapter result = raw_indexing_method(raw_key.tuple) File "C:\PydroXL_19\envs\dasktest\lib\site-packages\xarray\backends\netCDF4.py", line 83, in getitem original_array = self.get_array(needs_lock=False) File "C:\PydroXL_19\envs\dasktest\lib\site-packages\xarray\backends\netCDF4.py", line 62, in get_array ds = self.datastore.acquire(needs_lock) File "C:\PydroXL_19\envs\dasktest\lib\site-packages\xarray\backends\netCDF4.py", line 360, in _acquire with self._manager.acquire_context(needs_lock) as root: File "C:\PydroXL_19\envs\dasktest\lib\contextlib.py", line 81, in enter return next(self.gen) File "C:\PydroXL_19\envs\dasktest\lib\site-packages\xarray\backends\file_manager.py", line 186, in acquire_context file, cached = self._acquire_with_cache_info(needs_lock) File "C:\PydroXL_19\envs\dasktest\lib\site-packages\xarray\backends\file_manager.py", line 204, in _acquire_with_cache_info file = self._opener(*self._args, kwargs) File "netCDF4_netCDF4.pyx", line 2321, in netCDF4._netCDF4.Dataset.init File "netCDF4_netCDF4.pyx", line 1885, in netCDF4._netCDF4._ensure_nc_success OSError: [Errno -101] NetCDF: HDF error: b'D:\dasktest\data_dir\EM2040\converted\rangeangle_20.nc' ``` My last run on the synthetic data set generated above: ```python
distributed.worker - WARNING - Compute Failed Function: getter args: (ImplicitToExplicitIndexingAdapter(array=CopyOnWriteArray(array=LazilyOuterIndexedArray(array=<xarray.backends.netCDF4_.NetCDF4ArrayWrapper object at 0x000001BB5FCB82D0>, key=BasicIndexer((slice(None, None, None), slice(None, None, None), slice(None, None, None)))))), (slice(0, 1000, None), slice(0, 3, None), slice(0, 250, None))) kwargs: {} Exception: OSError(-101, 'NetCDF: HDF error') distributed.worker - WARNING - Compute Failed Function: getter args: (ImplicitToExplicitIndexingAdapter(array=CopyOnWriteArray(array=LazilyOuterIndexedArray(array=<xarray.backends.netCDF4_.NetCDF4ArrayWrapper object at 0x000001BB5FCB8240>, key=BasicIndexer((slice(None, None, None), slice(None, None, None), slice(None, None, None)))))), (slice(0, 1000, None), slice(0, 3, None), slice(0, 250, None))) kwargs: {} Exception: OSError(-101, 'NetCDF: HDF error') distributed.worker - WARNING - Compute Failed Function: getter args: (ImplicitToExplicitIndexingAdapter(array=CopyOnWriteArray(array=LazilyOuterIndexedArray(array=<xarray.backends.netCDF4_.NetCDF4ArrayWrapper object at 0x000001BB5FCB81F8>, key=BasicIndexer((slice(None, None, None), slice(None, None, None), slice(None, None, None)))))), (slice(0, 1000, None), slice(0, 3, None), slice(0, 250, None))) kwargs: {} Exception: OSError(-101, 'NetCDF: HDF error') distributed.worker - WARNING - Compute Failed Function: getter args: (ImplicitToExplicitIndexingAdapter(array=CopyOnWriteArray(array=LazilyOuterIndexedArray(array=<xarray.backends.netCDF4_.NetCDF4ArrayWrapper object at 0x000001BB5FCB81B0>, key=BasicIndexer((slice(None, None, None), slice(None, None, None), slice(None, None, None)))))), (slice(0, 1000, None), slice(0, 3, None), slice(0, 250, None))) kwargs: {} Exception: OSError(-101, 'NetCDF: HDF error') distributed.worker - WARNING - Compute Failed Function: getter args: (ImplicitToExplicitIndexingAdapter(array=CopyOnWriteArray(array=LazilyOuterIndexedArray(array=<xarray.backends.netCDF4_.NetCDF4ArrayWrapper object at 0x000001BB5FCB8360>, key=BasicIndexer((slice(None, None, None), slice(None, None, None), slice(None, None, None)))))), (slice(0, 1000, None), slice(0, 3, None), slice(0, 250, None))) kwargs: {} Exception: OSError(-101, 'NetCDF: HDF error') distributed.worker - WARNING - Compute Failed Function: getter args: (ImplicitToExplicitIndexingAdapter(array=CopyOnWriteArray(array=LazilyOuterIndexedArray(array=<xarray.backends.netCDF4_.NetCDF4ArrayWrapper object at 0x000001BB5FCB83A8>, key=BasicIndexer((slice(None, None, None), slice(None, None, None), slice(None, None, None)))))), (slice(0, 1000, None), slice(0, 3, None), slice(0, 250, None))) kwargs: {} Exception: OSError(-101, 'NetCDF: HDF error') distributed.worker - WARNING - Compute Failed Function: getter args: (ImplicitToExplicitIndexingAdapter(array=CopyOnWriteArray(array=LazilyOuterIndexedArray(array=<xarray.backends.netCDF4_.NetCDF4ArrayWrapper object at 0x000001BB5FCB8510>, key=BasicIndexer((slice(None, None, None), slice(None, None, None), slice(None, None, None)))))), (slice(0, 1000, None), slice(0, 3, None), slice(0, 250, None))) kwargs: {} Exception: OSError(-101, 'NetCDF: HDF error') distributed.worker - WARNING - Compute Failed Function: getter args: (ImplicitToExplicitIndexingAdapter(array=CopyOnWriteArray(array=LazilyOuterIndexedArray(array=<xarray.backends.netCDF4_.NetCDF4ArrayWrapper object at 0x000001BB5FCB8750>, key=BasicIndexer((slice(None, None, None), slice(None, None, None), slice(None, None, None)))))), (slice(0, 1000, None), slice(0, 3, None), slice(0, 250, None))) kwargs: {} Exception: OSError(-101, 'NetCDF: HDF error') distributed.worker - WARNING - Compute Failed Function: getter args: (ImplicitToExplicitIndexingAdapter(array=CopyOnWriteArray(array=LazilyOuterIndexedArray(array=<xarray.backends.netCDF4_.NetCDF4ArrayWrapper object at 0x000001BB5FCB8990>, key=BasicIndexer((slice(None, None, None), slice(None, None, None), slice(None, None, None)))))), (slice(0, 1000, None), slice(0, 3, None), slice(0, 250, None))) kwargs: {} Exception: OSError(-101, 'NetCDF: HDF error') distributed.worker - WARNING - Compute Failed Function: getter args: (ImplicitToExplicitIndexingAdapter(array=CopyOnWriteArray(array=LazilyOuterIndexedArray(array=<xarray.backends.netCDF4_.NetCDF4ArrayWrapper object at 0x000001BB5FCB8BD0>, key=BasicIndexer((slice(None, None, None), slice(None, None, None), slice(None, None, None)))))), (slice(0, 1000, None), slice(0, 3, None), slice(0, 250, None))) kwargs: {} Exception: OSError(-101, 'NetCDF: HDF error') distributed.worker - WARNING - Compute Failed Function: getter args: (ImplicitToExplicitIndexingAdapter(array=CopyOnWriteArray(array=LazilyOuterIndexedArray(array=<xarray.backends.netCDF4_.NetCDF4ArrayWrapper object at 0x000001BB5FCB8E10>, key=BasicIndexer((slice(None, None, None), slice(None, None, None), slice(None, None, None)))))), (slice(0, 1000, None), slice(0, 3, None), slice(0, 250, None))) kwargs: {} Exception: OSError(-101, 'NetCDF: HDF error') distributed.worker - WARNING - Compute Failed Function: getter args: (ImplicitToExplicitIndexingAdapter(array=CopyOnWriteArray(array=LazilyOuterIndexedArray(array=<xarray.backends.netCDF4_.NetCDF4ArrayWrapper object at 0x000001BB5FC9D090>, key=BasicIndexer((slice(None, None, None), slice(None, None, None), slice(None, None, None)))))), (slice(0, 1000, None), slice(0, 3, None), slice(0, 250, None))) kwargs: {} Exception: OSError(-101, 'NetCDF: HDF error') distributed.worker - WARNING - Compute Failed Function: getter args: (ImplicitToExplicitIndexingAdapter(array=CopyOnWriteArray(array=LazilyOuterIndexedArray(array=<xarray.backends.netCDF4_.NetCDF4ArrayWrapper object at 0x000001BB5FC9D2D0>, key=BasicIndexer((slice(None, None, None), slice(None, None, None), slice(None, None, None)))))), (slice(0, 1000, None), slice(0, 3, None), slice(0, 250, None))) kwargs: {} Exception: OSError(-101, 'NetCDF: HDF error') distributed.worker - WARNING - Compute Failed Function: getter args: (ImplicitToExplicitIndexingAdapter(array=CopyOnWriteArray(array=LazilyOuterIndexedArray(array=<xarray.backends.netCDF4_.NetCDF4ArrayWrapper object at 0x000001BB5FC9D510>, key=BasicIndexer((slice(None, None, None), slice(None, None, None), slice(None, None, None)))))), (slice(0, 1000, None), slice(0, 3, None), slice(0, 250, None))) kwargs: {} Exception: OSError(-101, 'NetCDF: HDF error') distributed.worker - WARNING - Compute Failed Function: getter args: (ImplicitToExplicitIndexingAdapter(array=CopyOnWriteArray(array=LazilyOuterIndexedArray(array=<xarray.backends.netCDF4_.NetCDF4ArrayWrapper object at 0x000001BB5FC9D750>, key=BasicIndexer((slice(None, None, None), slice(None, None, None), slice(None, None, None)))))), (slice(0, 1000, None), slice(0, 3, None), slice(0, 250, None))) kwargs: {} Exception: OSError(-101, 'NetCDF: HDF error') distributed.worker - WARNING - Compute Failed Function: getter args: (ImplicitToExplicitIndexingAdapter(array=CopyOnWriteArray(array=LazilyOuterIndexedArray(array=<xarray.backends.netCDF4_.NetCDF4ArrayWrapper object at 0x000001BB5FC9DC18>, key=BasicIndexer((slice(None, None, None), slice(None, None, None), slice(None, None, None)))))), (slice(0, 1000, None), slice(0, 3, None), slice(0, 250, None))) kwargs: {} Exception: OSError(-101, 'NetCDF: HDF error') distributed.worker - WARNING - Compute Failed Function: getter args: (ImplicitToExplicitIndexingAdapter(array=CopyOnWriteArray(array=LazilyOuterIndexedArray(array=<xarray.backends.netCDF4_.NetCDF4ArrayWrapper object at 0x000001BB5FC9DBD0>, key=BasicIndexer((slice(None, None, None), slice(None, None, None), slice(None, None, None)))))), (slice(0, 1000, None), slice(0, 3, None), slice(0, 250, None))) kwargs: {} Exception: OSError(-101, 'NetCDF: HDF error') distributed.worker - WARNING - Compute Failed Function: getter args: (ImplicitToExplicitIndexingAdapter(array=CopyOnWriteArray(array=LazilyOuterIndexedArray(array=<xarray.backends.netCDF4_.NetCDF4ArrayWrapper object at 0x000001BB5FC9DCA8>, key=BasicIndexer((slice(None, None, None), slice(None, None, None), slice(None, None, None)))))), (slice(0, 1000, None), slice(0, 3, None), slice(0, 250, None))) kwargs: {} Exception: OSError(-101, 'NetCDF: HDF error') Traceback (most recent call last): File "<stdin>", line 1, in <module> distributed.worker - WARNING - Compute Failed Function: getter args: (ImplicitToExplicitIndexingAdapter(array=CopyOnWriteArray(array=LazilyOuterIndexedArray(array=<xarray.backends.netCDF4_.NetCDF4ArrayWrapper object at 0x000001BB5FC9DD38>, key=BasicIndexer((slice(None, None, None), slice(None, None, None), slice(None, None, None)))))), (slice(0, 1000, None), slice(0, 3, None), slice(0, 250, None))) kwargs: {} Exception: OSError(-101, 'NetCDF: HDF error') File "C:\PydroXL_19\envs\dasktest\lib\site-packages\xarray\core\dataarray.py", line 837, in compute return new.load(kwargs) File "C:\PydroXL_19\envs\dasktest\lib\site-packages\xarray\core\dataarray.py", line 811, in load ds = self._to_temp_dataset().load(kwargs) File "C:\PydroXL_19\envs\dasktest\lib\site-packages\xarray\core\dataset.py", line 649, in load evaluated_data = da.compute(lazy_data.values(), kwargs) File "C:\PydroXL_19\envs\dasktest\lib\site-packages\dask\base.py", line 436, in compute results = schedule(dsk, keys, kwargs) File "C:\PydroXL_19\envs\dasktest\lib\site-packages\distributed\client.py", line 2545, in get results = self.gather(packed, asynchronous=asynchronous, direct=direct) File "C:\PydroXL_19\envs\dasktest\lib\site-packages\distributed\client.py", line 1845, in gather asynchronous=asynchronous, File "C:\PydroXL_19\envs\dasktest\lib\site-packages\distributed\client.py", line 762, in sync self.loop, func, args, callback_timeout=callback_timeout, kwargs File "C:\PydroXL_19\envs\dasktest\lib\site-packages\distributed\utils.py", line 333, in sync raise exc.with_traceback(tb) File "C:\PydroXL_19\envs\dasktest\lib\site-packages\distributed\utils.py", line 317, in f result[0] = yield future File "C:\PydroXL_19\envs\dasktest\lib\site-packages\tornado\gen.py", line 735, in run value = future.result() File "C:\PydroXL_19\envs\dasktest\lib\site-packages\distributed\client.py", line 1701, in gather raise exception.with_traceback(traceback) File "C:\PydroXL_19\envs\dasktest\lib\site-packages\dask\array\core.py", line 106, in getter c = np.asarray(c) File "C:\PydroXL_19\envs\dasktest\lib\site-packages\numpy\core_asarray.py", line 85, in asarray return array(a, dtype, copy=False, order=order) File "C:\PydroXL_19\envs\dasktest\lib\site-packages\xarray\core\indexing.py", line 481, in array return np.asarray(self.array, dtype=dtype) File "C:\PydroXL_19\envs\dasktest\lib\site-packages\numpy\core_asarray.py", line 85, in asarray return array(a, dtype, copy=False, order=order) File "C:\PydroXL_19\envs\dasktest\lib\site-packages\xarray\core\indexing.py", line 643, in array return np.asarray(self.array, dtype=dtype) File "C:\PydroXL_19\envs\dasktest\lib\site-packages\numpy\core_asarray.py", line 85, in asarray return array(a, dtype, copy=False, order=order) File "C:\PydroXL_19\envs\dasktest\lib\site-packages\xarray\core\indexing.py", line 547, in array return np.asarray(array[self.key], dtype=None) File "C:\PydroXL_19\envs\dasktest\lib\site-packages\xarray\backends\netCDF4.py", line 72, in getitem key, self.shape, indexing.IndexingSupport.OUTER, self.getitem File "C:\PydroXL_19\envs\dasktest\lib\site-packages\xarray\core\indexing.py", line 827, in explicit_indexing_adapter result = raw_indexing_method(raw_key.tuple) File "C:\PydroXL_19\envs\dasktest\lib\site-packages\xarray\backends\netCDF4.py", line 83, in getitem original_array = self.get_array(needs_lock=False) File "C:\PydroXL_19\envs\dasktest\lib\site-packages\xarray\backends\netCDF4.py", line 62, in get_array ds = self.datastore.acquire(needs_lock) File "C:\PydroXL_19\envs\dasktest\lib\site-packages\xarray\backends\netCDF4.py", line 360, in _acquire with self._manager.acquire_context(needs_lock) as root: File "C:\PydroXL_19\envs\dasktest\lib\contextlib.py", line 81, in enter return next(self.gen) File "C:\PydroXL_19\envs\dasktest\lib\site-packages\xarray\backends\file_manager.py", line 186, in acquire_context file, cached = self._acquire_with_cache_info(needs_lock) File "C:\PydroXL_19\envs\dasktest\lib\site-packages\xarray\backends\file_manager.py", line 204, in _acquire_with_cache_info file = self._opener(*self._args, kwargs) File "netCDF4_netCDF4.pyx", line 2321, in netCDF4._netCDF4.Dataset.init File "netCDF4_netCDF4.pyx", line 1885, in netCDF4._netCDF4._ensure_nc_success OSError: [Errno -101] NetCDF: HDF error: b'D:\dasktest\data_dir\EM2040\converted\test\test4.nc' ``` |
{
"url": "https://api.github.com/repos/pydata/xarray/issues/3684/reactions",
"total_count": 0,
"+1": 0,
"-1": 0,
"laugh": 0,
"hooray": 0,
"confused": 0,
"heart": 0,
"rocket": 0,
"eyes": 0
} |
xarray 13221727 | issue |
Advanced export
JSON shape: default, array, newline-delimited, object
CREATE TABLE [issues] (
[id] INTEGER PRIMARY KEY,
[node_id] TEXT,
[number] INTEGER,
[title] TEXT,
[user] INTEGER REFERENCES [users]([id]),
[state] TEXT,
[locked] INTEGER,
[assignee] INTEGER REFERENCES [users]([id]),
[milestone] INTEGER REFERENCES [milestones]([id]),
[comments] INTEGER,
[created_at] TEXT,
[updated_at] TEXT,
[closed_at] TEXT,
[author_association] TEXT,
[active_lock_reason] TEXT,
[draft] INTEGER,
[pull_request] TEXT,
[body] TEXT,
[reactions] TEXT,
[performed_via_github_app] TEXT,
[state_reason] TEXT,
[repo] INTEGER REFERENCES [repos]([id]),
[type] TEXT
);
CREATE INDEX [idx_issues_repo]
ON [issues] ([repo]);
CREATE INDEX [idx_issues_milestone]
ON [issues] ([milestone]);
CREATE INDEX [idx_issues_assignee]
ON [issues] ([assignee]);
CREATE INDEX [idx_issues_user]
ON [issues] ([user]);