issues: 1605108888
This data as json
id | node_id | number | title | user | state | locked | assignee | milestone | comments | created_at | updated_at | closed_at | author_association | active_lock_reason | draft | pull_request | body | reactions | performed_via_github_app | state_reason | repo | type |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
1605108888 | I_kwDOAMm_X85frASY | 7574 | xr.open_mfdataset doesn't work with fsspec and dask | 10137 | closed | 0 | 12 | 2023-03-01T14:45:56Z | 2023-09-08T00:33:41Z | 2023-09-08T00:33:41Z | NONE | What happened?I was trying to read multiple byte netcdf (requires h5netcdf engine) file with xr.open_mfdataset with parallel=True to leverage dask.delayed capabilities (parallel=False works though) but it failed. The netcdf files were noaa-goes16 satellite images, but I can't tell if it matters. What did you expect to happen?It should have loaded all the netcdf files into a xarray.DataSet object Minimal Complete Verifiable Example```python import fsspec import xarray as xr paths = [ 's3://noaa-goes16/ABI-L2-LSTC/2022/185/03/OR_ABI-L2-LSTC-M6_G16_s20221850301180_e20221850303553_c20221850305091.nc', 's3://noaa-goes16/ABI-L2-LSTC/2022/185/02/OR_ABI-L2-LSTC-M6_G16_s20221850201180_e20221850203553_c20221850205142.nc' ] fs = fsspec.filesystem('s3') xr.open_mfdataset( [fs.open(path, mode="rb") for path in paths], engine="h5netcdf", combine="nested", concat_dim="t", parallel=True ) ``` MVCE confirmation
Relevant log output```PythonKeyError Traceback (most recent call last) File ~/miniconda3/envs/rxr/lib/python3.11/site-packages/xarray/backends/file_manager.py:210, in CachingFileManager._acquire_with_cache_info(self, needs_lock) 209 try: --> 210 file = self._cache[self._key] 211 except KeyError: File ~/miniconda3/envs/rxr/lib/python3.11/site-packages/xarray/backends/lru_cache.py:56, in LRUCache.getitem(self, key) 55 with self._lock: ---> 56 value = self._cache[key] 57 self._cache.move_to_end(key) ?[0;31mKeyError?[0m: [<class 'h5netcdf.core.File'>, ((b'\x89HDF\r\n', b'\x1a\n', b'\x02\x08\x08\x00\x00\x00 ... EXTREMELY STRING ... 00\x00\x00\x00\x00\x00\x0ef'] During handling of the above exception, another exception occurred: TypeError Traceback (most recent call last) Cell In[9], line 11 4 paths = [ 5 's3://noaa-goes16/ABI-L2-LSTC/2022/185/03/OR_ABI-L2-LSTC-M6_G16_s20221850301180_e20221850303553_c20221850305091.nc', 6 's3://noaa-goes16/ABI-L2-LSTC/2022/185/02/OR_ABI-L2-LSTC-M6_G16_s20221850201180_e20221850203553_c20221850205142.nc' 7 ] 9 fs = fsspec.filesystem('s3') ---> 11 xr.open_mfdataset( 12 [fs.open(path, mode="rb") for path in paths], 13 engine="h5netcdf", 14 combine="nested", 15 concat_dim="t", 16 parallel=True 17 ).LST File ~/miniconda3/envs/rxr/lib/python3.11/site-packages/xarray/backends/api.py:991, in open_mfdataset(paths, chunks, concat_dim, compat, preprocess, engine, data_vars, coords, combine, parallel, join, attrs_file, combine_attrs, **kwargs) 986 datasets = [preprocess(ds) for ds in datasets] 988 if parallel: 989 # calling compute here will return the datasets/file_objs lists, 990 # the underlying datasets will still be stored as dask arrays --> 991 datasets, closers = dask.compute(datasets, closers) 993 # Combine all datasets, closing them in case of a ValueError 994 try: File ~/miniconda3/envs/rxr/lib/python3.11/site-packages/dask/base.py:599, in compute(traverse, optimize_graph, scheduler, get, args, kwargs) 596 keys.append(x.dask_keys()) 597 postcomputes.append(x.dask_postcompute()) --> 599 results = schedule(dsk, keys, kwargs) 600 return repack([f(r, a) for r, (f, a) in zip(results, postcomputes)]) File ~/miniconda3/envs/rxr/lib/python3.11/site-packages/dask/threaded.py:89, in get(dsk, keys, cache, num_workers, pool, kwargs) 86 elif isinstance(pool, multiprocessing.pool.Pool): 87 pool = MultiprocessingPoolExecutor(pool) ---> 89 results = get_async( 90 pool.submit, 91 pool._max_workers, 92 dsk, 93 keys, 94 cache=cache, 95 get_id=_thread_get_id, 96 pack_exception=pack_exception, 97 kwargs, 98 ) 100 # Cleanup pools associated to dead threads 101 with pools_lock: File ~/miniconda3/envs/rxr/lib/python3.11/site-packages/dask/local.py:511, in get_async(submit, num_workers, dsk, result, cache, get_id, rerun_exceptions_locally, pack_exception, raise_exception, callbacks, dumps, loads, chunksize, **kwargs) 509 _execute_task(task, data) # Re-execute locally 510 else: --> 511 raise_exception(exc, tb) 512 res, worker_id = loads(res_info) 513 state["cache"][key] = res File ~/miniconda3/envs/rxr/lib/python3.11/site-packages/dask/local.py:319, in reraise(exc, tb) 317 if exc.traceback is not tb: 318 raise exc.with_traceback(tb) --> 319 raise exc File ~/miniconda3/envs/rxr/lib/python3.11/site-packages/dask/local.py:224, in execute_task(key, task_info, dumps, loads, get_id, pack_exception) 222 try: 223 task, data = loads(task_info) --> 224 result = _execute_task(task, data) 225 id = get_id() 226 result = dumps((result, id)) File ~/miniconda3/envs/rxr/lib/python3.11/site-packages/dask/core.py:119, in _execute_task(arg, cache, dsk) 115 func, args = arg[0], arg[1:] 116 # Note: Don't assign the subtask results to a variable. numpy detects 117 # temporaries by their reference count and can execute certain 118 # operations in-place. --> 119 return func(*(_execute_task(a, cache) for a in args)) 120 elif not ishashable(arg): 121 return arg File ~/miniconda3/envs/rxr/lib/python3.11/site-packages/dask/utils.py:73, in apply(func, args, kwargs)
42 """Apply a function given its positional and keyword arguments.
43
44 Equivalent to TypeError: expected str, bytes or os.PathLike object, not tuple ``` Anything else we need to know?No response Environment
INSTALLED VERSIONS
------------------
commit: None
python: 3.11.0 | packaged by conda-forge | (main, Jan 15 2023, 05:44:48) [Clang 14.0.6 ]
python-bits: 64
OS: Darwin
OS-release: 21.6.0
machine: x86_64
processor: i386
byteorder: little
LC_ALL: None
LANG: None
LOCALE: (None, 'UTF-8')
libhdf5: 1.12.2
libnetcdf: None
xarray: 2023.2.0
pandas: 1.5.3
numpy: 1.24.2
scipy: 1.10.1
netCDF4: None
pydap: None
h5netcdf: 1.1.0
h5py: 3.8.0
Nio: None
zarr: None
cftime: None
nc_time_axis: None
PseudoNetCDF: None
rasterio: 1.3.6
cfgrib: None
iris: None
bottleneck: None
dask: 2023.2.1
distributed: 2023.2.1
matplotlib: 3.7.0
cartopy: 0.21.1
seaborn: 0.12.2
numbagg: None
fsspec: 2023.1.0
cupy: None
pint: None
sparse: None
flox: None
numpy_groupies: None
setuptools: 67.4.0
pip: 23.0.1
conda: None
pytest: 7.2.1
mypy: None
IPython: 8.10.0
sphinx: None
[/Users/jo/miniconda3/envs/rxr/lib/python3.11/site-packages/_distutils_hack/__init__.py:33](https://file+.vscode-resource.vscode-cdn.net/Users/jo/miniconda3/envs/rxr/lib/python3.11/site-packages/_distutils_hack/__init__.py:33): UserWarning: Setuptools is replacing distutils.
warnings.warn("Setuptools is replacing distutils.")
|
{ "url": "https://api.github.com/repos/pydata/xarray/issues/7574/reactions", "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
completed | 13221727 | issue |