home / github / issue_comments

Menu
  • GraphQL API
  • Search all tables

issue_comments: 1460903268

This data as json

html_url issue_url id node_id user created_at updated_at author_association body reactions performed_via_github_app issue
https://github.com/pydata/xarray/issues/7574#issuecomment-1460903268 https://api.github.com/repos/pydata/xarray/issues/7574 1460903268 IC_kwDOAMm_X85XE51k 127195910 2023-03-08T21:30:53Z 2023-03-15T16:53:58Z NONE

What happened?

I was trying to read multiple byte netcdf (requires h5netcdf engine) file with xr.open_mfdataset with parallel=True to leverage dask.delayed capabilities (parallel=False works though) but it failed.

The netcdf files were noaa-goes16 satellite images, but I can't tell if it matters.

What did you expect to happen?

It should have loaded all the netcdf files into a xarray.DataSet object

Minimal Complete Verifiable Example

```

Python

import fsspec

import xarray as xr

paths = [

's3://noaa-goes16/ABI-L2-LSTC/2022/185/03/OR_ABI-L2-LSTC-M6_G16_s20221850301180_e20221850303553_c20221850305091.nc',

's3://noaa-goes16/ABI-L2-LSTC/2022/185/02/OR_ABI-L2-LSTC-M6_G16_s20221850201180_e20221850203553_c20221850205142.nc'

]

fs = fsspec.filesystem('s3')

xr.open_mfdataset(

[fs.open(path, mode="rb") for path in paths],

engine="h5netcdf",

combine="nested",

concat_dim="t",

parallel=True

)

```

MVCE confirmation

  • [ ] Minimal example — the example is as focused as reasonably possible to demonstrate the underlying issue in xarray.

  • [X] Complete example — the example is self-contained, including all data and the text of any traceback.

  • [ ] Verifiable example — the example copy & pastes into an IPython prompt or Binder notebook, returning the result.

  • [X] New issue — a search of GitHub Issues suggests this is not a duplicate.

Relevant log output

```Python


KeyError Traceback (most recent call last)

File ~/miniconda3/envs/rxr/lib/python3.11/site-packages/xarray/backends/file_manager.py:210, in CachingFileManager._acquire_with_cache_info(self, needs_lock)

209 try:

--> 210 file = self._cache[self._key]

211 except KeyError:

File ~/miniconda3/envs/rxr/lib/python3.11/site-packages/xarray/backends/lru_cache.py:56, in LRUCache.getitem(self, key)

 55 with self._lock:

---> 56 value = self._cache[key]

 57     self._cache.move_to_end(key)

?[0;31mKeyError?[0m: [<class 'h5netcdf.core.File'>, ((b'\x89HDF\r\n', b'\x1a\n', b'\x02\x08\x08\x00\x00\x00 ... EXTREMELY STRING ... 00\x00\x00\x00\x00\x00\x0ef']

During handling of the above exception, another exception occurred:

TypeError Traceback (most recent call last)

Cell In[9], line 11

  4 paths = [

  5     's3://noaa-goes16/ABI-L2-LSTC/2022/185/03/OR_ABI-L2-LSTC-M6_G16_s20221850301180_e20221850303553_c20221850305091.nc',

  6     's3://noaa-goes16/ABI-L2-LSTC/2022/185/02/OR_ABI-L2-LSTC-M6_G16_s20221850201180_e20221850203553_c20221850205142.nc'

  7     ]

  9 fs = fsspec.filesystem('s3')

---> 11 xr.open_mfdataset(

 12     [fs.open(path, mode="rb") for path in paths],

 13     engine="h5netcdf",

 14     combine="nested",

 15     concat_dim="t",

 16     parallel=True

 17 ).LST

File ~/miniconda3/envs/rxr/lib/python3.11/site-packages/xarray/backends/api.py:991, in open_mfdataset(paths, chunks, concat_dim, compat, preprocess, engine, data_vars, coords, combine, parallel, join, attrs_file, combine_attrs, **kwargs)

986     datasets = [preprocess(ds) for ds in datasets]

988 if parallel:

989     # calling compute here will return the datasets/file_objs lists,

990     # the underlying datasets will still be stored as dask arrays

--> 991 datasets, closers = dask.compute(datasets, closers)

993 # Combine all datasets, closing them in case of a ValueError

994 try:

File ~/miniconda3/envs/rxr/lib/python3.11/site-packages/dask/base.py:599, in compute(traverse, optimize_graph, scheduler, get, args, *kwargs)

596     keys.append(x.__dask_keys__())

597     postcomputes.append(x.__dask_postcompute__())

--> 599 results = schedule(dsk, keys, **kwargs)

600 return repack([f(r, *a) for r, (f, a) in zip(results, postcomputes)])

File ~/miniconda3/envs/rxr/lib/python3.11/site-packages/dask/threaded.py:89, in get(dsk, keys, cache, num_workers, pool, **kwargs)

 86     elif isinstance(pool, multiprocessing.pool.Pool):

 87         pool = MultiprocessingPoolExecutor(pool)

---> 89 results = get_async(

 90     pool.submit,

 91     pool._max_workers,

 92     dsk,

 93     keys,

 94     cache=cache,

 95     get_id=_thread_get_id,

 96     pack_exception=pack_exception,

 97     **kwargs,

 98 )

100 # Cleanup pools associated to dead threads

101 with pools_lock:

File ~/miniconda3/envs/rxr/lib/python3.11/site-packages/dask/local.py:511, in get_async(submit, num_workers, dsk, result, cache, get_id, rerun_exceptions_locally, pack_exception, raise_exception, callbacks, dumps, loads, chunksize, **kwargs)

509         _execute_task(task, data)  # Re-execute locally

510     else:

--> 511 raise_exception(exc, tb)

512 res, worker_id = loads(res_info)

513 state["cache"][key] = res

File ~/miniconda3/envs/rxr/lib/python3.11/site-packages/dask/local.py:319, in reraise(exc, tb)

317 if exc.__traceback__ is not tb:

318     raise exc.with_traceback(tb)

--> 319 raise exc

File ~/miniconda3/envs/rxr/lib/python3.11/site-packages/dask/local.py:224, in execute_task(key, task_info, dumps, loads, get_id, pack_exception)

222 try:

223     task, data = loads(task_info)

--> 224 result = _execute_task(task, data)

225     id = get_id()

226     result = dumps((result, id))

File ~/miniconda3/envs/rxr/lib/python3.11/site-packages/dask/core.py:119, in _execute_task(arg, cache, dsk)

115     func, args = arg[0], arg[1:]

116     # Note: Don't assign the subtask results to a variable. numpy detects

117     # temporaries by their reference count and can execute certain

118     # operations in-place.

--> 119 return func(*(_execute_task(a, cache) for a in args))

120 elif not ishashable(arg):

121     return arg

File ~/miniconda3/envs/rxr/lib/python3.11/site-packages/dask/utils.py:73, in apply(func, args, kwargs)

 42 """Apply a function given its positional and keyword arguments.

 43

 44 Equivalent to ``func(*args, **kwargs)``

(...)

...

---> 19 filename = fspath(filename)

 20     if sys.platform == "win32":

 21         if isinstance(filename, str):

TypeError: expected str, bytes or os.PathLike object, not tuple

```

Anything else we need to know?

No response

Environment

INSTALLED VERSIONS


commit: None

python: 3.11.0 | packaged by conda-forge | (main, Jan 15 2023, 05:44:48) [Clang 14.0.6 ]

python-bits: 64

OS: Darwin

OS-release: 21.6.0

machine: x86_64

processor: i386

byteorder: little

LC_ALL: None

LANG: None

LOCALE: (None, 'UTF-8')

libhdf5: 1.12.2

libnetcdf: None

xarray: 2023.2.0

pandas: 1.5.3

numpy: 1.24.2

scipy: 1.10.1

netCDF4: None

pydap: None

h5netcdf: 1.1.0

h5py: 3.8.0

Nio: None

zarr: None

cftime: None

nc_time_axis: None

PseudoNetCDF: None

rasterio: 1.3.6

cfgrib: None

iris: None

bottleneck: None

dask: 2023.2.1

distributed: 2023.2.1

matplotlib: 3.7.0

cartopy: 0.21.1

seaborn: 0.12.2

numbagg: None

fsspec: 2023.1.0

cupy: None

pint: None

sparse: None

flox: None

numpy_groupies: None

setuptools: 67.4.0

pip: 23.0.1

conda: None

pytest: 7.2.1

mypy: None

IPython: 8.10.0

sphinx: None

/Users/jo/miniconda3/envs/rxr/lib/python3.11/site-packages/_distutils_hack/init.py:33: UserWarning: Setuptools is replacing distutils.

warnings.warn("Setuptools is replacing distutils.")

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  1605108888
Powered by Datasette · Queries took 0.695ms · About: xarray-datasette