home / github

Menu
  • GraphQL API
  • Search all tables

issues

Table actions
  • GraphQL API for issues

17 rows where user = 10137 sorted by updated_at descending

✎ View and edit SQL

This data as json, CSV (advanced)

Suggested facets: comments, created_at (date), updated_at (date), closed_at (date)

type 2

  • issue 14
  • pull 3

state 2

  • closed 16
  • open 1

repo 1

  • xarray 17
id node_id number title user state locked assignee milestone comments created_at updated_at ▲ closed_at author_association active_lock_reason draft pull_request body reactions performed_via_github_app state_reason repo type
1605108888 I_kwDOAMm_X85frASY 7574 xr.open_mfdataset doesn't work with fsspec and dask ghost 10137 closed 0     12 2023-03-01T14:45:56Z 2023-09-08T00:33:41Z 2023-09-08T00:33:41Z NONE      

What happened?

I was trying to read multiple byte netcdf (requires h5netcdf engine) file with xr.open_mfdataset with parallel=True to leverage dask.delayed capabilities (parallel=False works though) but it failed.

The netcdf files were noaa-goes16 satellite images, but I can't tell if it matters.

What did you expect to happen?

It should have loaded all the netcdf files into a xarray.DataSet object

Minimal Complete Verifiable Example

```python import fsspec import xarray as xr

paths = [ 's3://noaa-goes16/ABI-L2-LSTC/2022/185/03/OR_ABI-L2-LSTC-M6_G16_s20221850301180_e20221850303553_c20221850305091.nc', 's3://noaa-goes16/ABI-L2-LSTC/2022/185/02/OR_ABI-L2-LSTC-M6_G16_s20221850201180_e20221850203553_c20221850205142.nc' ]

fs = fsspec.filesystem('s3')

xr.open_mfdataset( [fs.open(path, mode="rb") for path in paths], engine="h5netcdf", combine="nested", concat_dim="t", parallel=True )

```

MVCE confirmation

  • [ ] Minimal example — the example is as focused as reasonably possible to demonstrate the underlying issue in xarray.
  • [X] Complete example — the example is self-contained, including all data and the text of any traceback.
  • [ ] Verifiable example — the example copy & pastes into an IPython prompt or Binder notebook, returning the result.
  • [X] New issue — a search of GitHub Issues suggests this is not a duplicate.

Relevant log output

```Python

KeyError Traceback (most recent call last) File ~/miniconda3/envs/rxr/lib/python3.11/site-packages/xarray/backends/file_manager.py:210, in CachingFileManager._acquire_with_cache_info(self, needs_lock) 209 try: --> 210 file = self._cache[self._key] 211 except KeyError:

File ~/miniconda3/envs/rxr/lib/python3.11/site-packages/xarray/backends/lru_cache.py:56, in LRUCache.getitem(self, key) 55 with self._lock: ---> 56 value = self._cache[key] 57 self._cache.move_to_end(key)

?[0;31mKeyError?[0m: [<class 'h5netcdf.core.File'>, ((b'\x89HDF\r\n', b'\x1a\n', b'\x02\x08\x08\x00\x00\x00 ... EXTREMELY STRING ... 00\x00\x00\x00\x00\x00\x0ef']

During handling of the above exception, another exception occurred:

TypeError Traceback (most recent call last) Cell In[9], line 11 4 paths = [ 5 's3://noaa-goes16/ABI-L2-LSTC/2022/185/03/OR_ABI-L2-LSTC-M6_G16_s20221850301180_e20221850303553_c20221850305091.nc', 6 's3://noaa-goes16/ABI-L2-LSTC/2022/185/02/OR_ABI-L2-LSTC-M6_G16_s20221850201180_e20221850203553_c20221850205142.nc' 7 ] 9 fs = fsspec.filesystem('s3') ---> 11 xr.open_mfdataset( 12 [fs.open(path, mode="rb") for path in paths], 13 engine="h5netcdf", 14 combine="nested", 15 concat_dim="t", 16 parallel=True 17 ).LST

File ~/miniconda3/envs/rxr/lib/python3.11/site-packages/xarray/backends/api.py:991, in open_mfdataset(paths, chunks, concat_dim, compat, preprocess, engine, data_vars, coords, combine, parallel, join, attrs_file, combine_attrs, **kwargs) 986 datasets = [preprocess(ds) for ds in datasets] 988 if parallel: 989 # calling compute here will return the datasets/file_objs lists, 990 # the underlying datasets will still be stored as dask arrays --> 991 datasets, closers = dask.compute(datasets, closers) 993 # Combine all datasets, closing them in case of a ValueError 994 try:

File ~/miniconda3/envs/rxr/lib/python3.11/site-packages/dask/base.py:599, in compute(traverse, optimize_graph, scheduler, get, args, kwargs) 596 keys.append(x.dask_keys()) 597 postcomputes.append(x.dask_postcompute()) --> 599 results = schedule(dsk, keys, kwargs) 600 return repack([f(r, a) for r, (f, a) in zip(results, postcomputes)])

File ~/miniconda3/envs/rxr/lib/python3.11/site-packages/dask/threaded.py:89, in get(dsk, keys, cache, num_workers, pool, kwargs) 86 elif isinstance(pool, multiprocessing.pool.Pool): 87 pool = MultiprocessingPoolExecutor(pool) ---> 89 results = get_async( 90 pool.submit, 91 pool._max_workers, 92 dsk, 93 keys, 94 cache=cache, 95 get_id=_thread_get_id, 96 pack_exception=pack_exception, 97 kwargs, 98 ) 100 # Cleanup pools associated to dead threads 101 with pools_lock:

File ~/miniconda3/envs/rxr/lib/python3.11/site-packages/dask/local.py:511, in get_async(submit, num_workers, dsk, result, cache, get_id, rerun_exceptions_locally, pack_exception, raise_exception, callbacks, dumps, loads, chunksize, **kwargs) 509 _execute_task(task, data) # Re-execute locally 510 else: --> 511 raise_exception(exc, tb) 512 res, worker_id = loads(res_info) 513 state["cache"][key] = res

File ~/miniconda3/envs/rxr/lib/python3.11/site-packages/dask/local.py:319, in reraise(exc, tb) 317 if exc.traceback is not tb: 318 raise exc.with_traceback(tb) --> 319 raise exc

File ~/miniconda3/envs/rxr/lib/python3.11/site-packages/dask/local.py:224, in execute_task(key, task_info, dumps, loads, get_id, pack_exception) 222 try: 223 task, data = loads(task_info) --> 224 result = _execute_task(task, data) 225 id = get_id() 226 result = dumps((result, id))

File ~/miniconda3/envs/rxr/lib/python3.11/site-packages/dask/core.py:119, in _execute_task(arg, cache, dsk) 115 func, args = arg[0], arg[1:] 116 # Note: Don't assign the subtask results to a variable. numpy detects 117 # temporaries by their reference count and can execute certain 118 # operations in-place. --> 119 return func(*(_execute_task(a, cache) for a in args)) 120 elif not ishashable(arg): 121 return arg

File ~/miniconda3/envs/rxr/lib/python3.11/site-packages/dask/utils.py:73, in apply(func, args, kwargs) 42 """Apply a function given its positional and keyword arguments. 43 44 Equivalent to func(*args, **kwargs) (...) ... ---> 19 filename = fspath(filename) 20 if sys.platform == "win32": 21 if isinstance(filename, str):

TypeError: expected str, bytes or os.PathLike object, not tuple ```

Anything else we need to know?

No response

Environment

INSTALLED VERSIONS ------------------ commit: None python: 3.11.0 | packaged by conda-forge | (main, Jan 15 2023, 05:44:48) [Clang 14.0.6 ] python-bits: 64 OS: Darwin OS-release: 21.6.0 machine: x86_64 processor: i386 byteorder: little LC_ALL: None LANG: None LOCALE: (None, 'UTF-8') libhdf5: 1.12.2 libnetcdf: None xarray: 2023.2.0 pandas: 1.5.3 numpy: 1.24.2 scipy: 1.10.1 netCDF4: None pydap: None h5netcdf: 1.1.0 h5py: 3.8.0 Nio: None zarr: None cftime: None nc_time_axis: None PseudoNetCDF: None rasterio: 1.3.6 cfgrib: None iris: None bottleneck: None dask: 2023.2.1 distributed: 2023.2.1 matplotlib: 3.7.0 cartopy: 0.21.1 seaborn: 0.12.2 numbagg: None fsspec: 2023.1.0 cupy: None pint: None sparse: None flox: None numpy_groupies: None setuptools: 67.4.0 pip: 23.0.1 conda: None pytest: 7.2.1 mypy: None IPython: 8.10.0 sphinx: None [/Users/jo/miniconda3/envs/rxr/lib/python3.11/site-packages/_distutils_hack/__init__.py:33](https://file+.vscode-resource.vscode-cdn.net/Users/jo/miniconda3/envs/rxr/lib/python3.11/site-packages/_distutils_hack/__init__.py:33): UserWarning: Setuptools is replacing distutils. warnings.warn("Setuptools is replacing distutils.")
{
    "url": "https://api.github.com/repos/pydata/xarray/issues/7574/reactions",
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  completed xarray 13221727 issue
548263148 MDU6SXNzdWU1NDgyNjMxNDg= 3684 open_mfdataset - different behavior with dask.distributed.LocalCluster ghost 10137 open 0     3 2020-01-10T19:58:19Z 2023-09-05T10:56:23Z   NONE      

Big fan of Xarray! Not that familiar with submitting tickets like this, so my apologies for rule breaking. Also, if this belongs over in the dask project, I can move there.

dask 2.6.0 numpy 1.17.3 xarray 0.14.1 netCDF4 1.5.3

I am attempting to use open_mfdataset on nc files I've generated through dask/xarray after initializing the dask LocalCluster. I've found that I am able to compute successfully when I don't run the distributed cluster. But if I do, I get a variety of issues. I've got a synthetic data generating example here. Running the soundspeed.compute() will sometimes succeed, and will sometimes cause worker restarts resulting in hdf errors and no return.

I was thinking it was something with serialization, i've seen other tickets with similar issues, but I don't see how it applies to my test case.

Example code:

```python import numpy as np import xarray as xr import os from dask.distributed import Client

cl = Client() outpth = r'D:\dasktest\data_dir\EM2040\converted\test' mint = 0 maxt = 1000

for i in range(100): times = np.arange(mint, maxt) beams = np.arange(250) sectors=['40107_0_260000', '40107_1_320000', '40107_2_290000'] soundspeed = np.random.randn(1000,3,250) ds = xr.Dataset({'soundspeed': (('time','sectors','beams'), soundspeed)}, {'time': times, 'sectors': sectors, 'beams':beams},) ds.to_netcdf(os.path.join(outpth, 'test{}.nc'.format(i)), mode='w') mint = maxt maxt += 1000

fils = [os.path.join(outpth, x) for x in os.listdir(outpth) if os.path.splitext(x)[1] == '.nc'] tst = xr.open_mfdataset(fils, concat_dim='time', combine='nested') tst.soundspeed.compute() ```

I've found that running this example with <10 files reduces the number of errors I'm getting dramatically. I've tried this on different machines in different domain environments just to be sure. I really just want to make sure I'm not making a silly mistake somewhere. Appreciate the help.

My last run on actual data:

```python

ra.soundspeed.compute() distributed.nanny - WARNING - Restarting worker distributed.nanny - WARNING - Restarting worker distributed.nanny - WARNING - Restarting worker distributed.worker - WARNING - Compute Failed Function: getter args: (ImplicitToExplicitIndexingAdapter(array=CopyOnWriteArray(array=LazilyOuterIndexedArray(array=<xarray.backends.netCDF4_.NetCDF4ArrayWrapper object at 0x000001F83F1E2360>, key=BasicIndexer((slice(None, None, None), slice(None, None, None)))))), (slice(0, 1719, None), slice(0, 3, None))) kwargs: {} Exception: OSError(-101, 'NetCDF: HDF error')

Traceback (most recent call last): File "<stdin>", line 1, in <module> File "C:\PydroXL_19\envs\dasktest\lib\site-packages\xarray\core\dataarray.py", line 837, in compute return new.load(kwargs) File "C:\PydroXL_19\envs\dasktest\lib\site-packages\xarray\core\dataarray.py", line 811, in load ds = self._to_temp_dataset().load(kwargs) File "C:\PydroXL_19\envs\dasktest\lib\site-packages\xarray\core\dataset.py", line 649, in load evaluated_data = da.compute(lazy_data.values(), kwargs) File "C:\PydroXL_19\envs\dasktest\lib\site-packages\dask\base.py", line 436, in compute results = schedule(dsk, keys, kwargs) File "C:\PydroXL_19\envs\dasktest\lib\site-packages\distributed\client.py", line 2545, in get results = self.gather(packed, asynchronous=asynchronous, direct=direct) File "C:\PydroXL_19\envs\dasktest\lib\site-packages\distributed\client.py", line 1845, in gather asynchronous=asynchronous, File "C:\PydroXL_19\envs\dasktest\lib\site-packages\distributed\client.py", line 762, in sync self.loop, func, args, callback_timeout=callback_timeout, kwargs File "C:\PydroXL_19\envs\dasktest\lib\site-packages\distributed\utils.py", line 333, in sync raise exc.with_traceback(tb) File "C:\PydroXL_19\envs\dasktest\lib\site-packages\distributed\utils.py", line 317, in f result[0] = yield future File "C:\PydroXL_19\envs\dasktest\lib\site-packages\tornado\gen.py", line 735, in run value = future.result() File "C:\PydroXL_19\envs\dasktest\lib\site-packages\distributed\client.py", line 1701, in gather raise exception.with_traceback(traceback) File "C:\PydroXL_19\envs\dasktest\lib\site-packages\dask\array\core.py", line 106, in getter c = np.asarray(c) File "C:\PydroXL_19\envs\dasktest\lib\site-packages\numpy\core_asarray.py", line 85, in asarray return array(a, dtype, copy=False, order=order) File "C:\PydroXL_19\envs\dasktest\lib\site-packages\xarray\core\indexing.py", line 481, in array return np.asarray(self.array, dtype=dtype) File "C:\PydroXL_19\envs\dasktest\lib\site-packages\numpy\core_asarray.py", line 85, in asarray return array(a, dtype, copy=False, order=order) File "C:\PydroXL_19\envs\dasktest\lib\site-packages\xarray\core\indexing.py", line 643, in array return np.asarray(self.array, dtype=dtype) File "C:\PydroXL_19\envs\dasktest\lib\site-packages\numpy\core_asarray.py", line 85, in asarray return array(a, dtype, copy=False, order=order) File "C:\PydroXL_19\envs\dasktest\lib\site-packages\xarray\core\indexing.py", line 547, in array return np.asarray(array[self.key], dtype=None) File "C:\PydroXL_19\envs\dasktest\lib\site-packages\xarray\backends\netCDF4.py", line 72, in getitem key, self.shape, indexing.IndexingSupport.OUTER, self.getitem File "C:\PydroXL_19\envs\dasktest\lib\site-packages\xarray\core\indexing.py", line 827, in explicit_indexing_adapter result = raw_indexing_method(raw_key.tuple) File "C:\PydroXL_19\envs\dasktest\lib\site-packages\xarray\backends\netCDF4.py", line 83, in getitem original_array = self.get_array(needs_lock=False) File "C:\PydroXL_19\envs\dasktest\lib\site-packages\xarray\backends\netCDF4.py", line 62, in get_array ds = self.datastore.acquire(needs_lock) File "C:\PydroXL_19\envs\dasktest\lib\site-packages\xarray\backends\netCDF4.py", line 360, in _acquire with self._manager.acquire_context(needs_lock) as root: File "C:\PydroXL_19\envs\dasktest\lib\contextlib.py", line 81, in enter return next(self.gen) File "C:\PydroXL_19\envs\dasktest\lib\site-packages\xarray\backends\file_manager.py", line 186, in acquire_context file, cached = self._acquire_with_cache_info(needs_lock) File "C:\PydroXL_19\envs\dasktest\lib\site-packages\xarray\backends\file_manager.py", line 204, in _acquire_with_cache_info file = self._opener(*self._args, kwargs) File "netCDF4_netCDF4.pyx", line 2321, in netCDF4._netCDF4.Dataset.init File "netCDF4_netCDF4.pyx", line 1885, in netCDF4._netCDF4._ensure_nc_success OSError: [Errno -101] NetCDF: HDF error: b'D:\dasktest\data_dir\EM2040\converted\rangeangle_20.nc' ```

My last run on the synthetic data set generated above:

```python

tst.soundspeed.compute() distributed.worker - WARNING - Compute Failed Function: getter args: (ImplicitToExplicitIndexingAdapter(array=CopyOnWriteArray(array=LazilyOuterIndexedArray(array=<xarray.backends.netCDF4_.NetCDF4ArrayWrapper object at 0x000001BB5FC8AA20>, key=BasicIndexer((slice(None, None, None), slice(None, None, None), slice(None, None, None)))))), (slice(0, 1000, None), slice(0, 3, None), slice(0, 250, None))) kwargs: {} Exception: OSError(-101, 'NetCDF: HDF error')

distributed.worker - WARNING - Compute Failed Function: getter args: (ImplicitToExplicitIndexingAdapter(array=CopyOnWriteArray(array=LazilyOuterIndexedArray(array=<xarray.backends.netCDF4_.NetCDF4ArrayWrapper object at 0x000001BB5FCB82D0>, key=BasicIndexer((slice(None, None, None), slice(None, None, None), slice(None, None, None)))))), (slice(0, 1000, None), slice(0, 3, None), slice(0, 250, None))) kwargs: {} Exception: OSError(-101, 'NetCDF: HDF error')

distributed.worker - WARNING - Compute Failed Function: getter args: (ImplicitToExplicitIndexingAdapter(array=CopyOnWriteArray(array=LazilyOuterIndexedArray(array=<xarray.backends.netCDF4_.NetCDF4ArrayWrapper object at 0x000001BB5FCB8240>, key=BasicIndexer((slice(None, None, None), slice(None, None, None), slice(None, None, None)))))), (slice(0, 1000, None), slice(0, 3, None), slice(0, 250, None))) kwargs: {} Exception: OSError(-101, 'NetCDF: HDF error')

distributed.worker - WARNING - Compute Failed Function: getter args: (ImplicitToExplicitIndexingAdapter(array=CopyOnWriteArray(array=LazilyOuterIndexedArray(array=<xarray.backends.netCDF4_.NetCDF4ArrayWrapper object at 0x000001BB5FCB81F8>, key=BasicIndexer((slice(None, None, None), slice(None, None, None), slice(None, None, None)))))), (slice(0, 1000, None), slice(0, 3, None), slice(0, 250, None))) kwargs: {} Exception: OSError(-101, 'NetCDF: HDF error')

distributed.worker - WARNING - Compute Failed Function: getter args: (ImplicitToExplicitIndexingAdapter(array=CopyOnWriteArray(array=LazilyOuterIndexedArray(array=<xarray.backends.netCDF4_.NetCDF4ArrayWrapper object at 0x000001BB5FCB81B0>, key=BasicIndexer((slice(None, None, None), slice(None, None, None), slice(None, None, None)))))), (slice(0, 1000, None), slice(0, 3, None), slice(0, 250, None))) kwargs: {} Exception: OSError(-101, 'NetCDF: HDF error')

distributed.worker - WARNING - Compute Failed Function: getter args: (ImplicitToExplicitIndexingAdapter(array=CopyOnWriteArray(array=LazilyOuterIndexedArray(array=<xarray.backends.netCDF4_.NetCDF4ArrayWrapper object at 0x000001BB5FCB8360>, key=BasicIndexer((slice(None, None, None), slice(None, None, None), slice(None, None, None)))))), (slice(0, 1000, None), slice(0, 3, None), slice(0, 250, None))) kwargs: {} Exception: OSError(-101, 'NetCDF: HDF error')

distributed.worker - WARNING - Compute Failed Function: getter args: (ImplicitToExplicitIndexingAdapter(array=CopyOnWriteArray(array=LazilyOuterIndexedArray(array=<xarray.backends.netCDF4_.NetCDF4ArrayWrapper object at 0x000001BB5FCB83A8>, key=BasicIndexer((slice(None, None, None), slice(None, None, None), slice(None, None, None)))))), (slice(0, 1000, None), slice(0, 3, None), slice(0, 250, None))) kwargs: {} Exception: OSError(-101, 'NetCDF: HDF error')

distributed.worker - WARNING - Compute Failed Function: getter args: (ImplicitToExplicitIndexingAdapter(array=CopyOnWriteArray(array=LazilyOuterIndexedArray(array=<xarray.backends.netCDF4_.NetCDF4ArrayWrapper object at 0x000001BB5FCB8510>, key=BasicIndexer((slice(None, None, None), slice(None, None, None), slice(None, None, None)))))), (slice(0, 1000, None), slice(0, 3, None), slice(0, 250, None))) kwargs: {} Exception: OSError(-101, 'NetCDF: HDF error')

distributed.worker - WARNING - Compute Failed Function: getter args: (ImplicitToExplicitIndexingAdapter(array=CopyOnWriteArray(array=LazilyOuterIndexedArray(array=<xarray.backends.netCDF4_.NetCDF4ArrayWrapper object at 0x000001BB5FCB8750>, key=BasicIndexer((slice(None, None, None), slice(None, None, None), slice(None, None, None)))))), (slice(0, 1000, None), slice(0, 3, None), slice(0, 250, None))) kwargs: {} Exception: OSError(-101, 'NetCDF: HDF error')

distributed.worker - WARNING - Compute Failed Function: getter args: (ImplicitToExplicitIndexingAdapter(array=CopyOnWriteArray(array=LazilyOuterIndexedArray(array=<xarray.backends.netCDF4_.NetCDF4ArrayWrapper object at 0x000001BB5FCB8990>, key=BasicIndexer((slice(None, None, None), slice(None, None, None), slice(None, None, None)))))), (slice(0, 1000, None), slice(0, 3, None), slice(0, 250, None))) kwargs: {} Exception: OSError(-101, 'NetCDF: HDF error')

distributed.worker - WARNING - Compute Failed Function: getter args: (ImplicitToExplicitIndexingAdapter(array=CopyOnWriteArray(array=LazilyOuterIndexedArray(array=<xarray.backends.netCDF4_.NetCDF4ArrayWrapper object at 0x000001BB5FCB8BD0>, key=BasicIndexer((slice(None, None, None), slice(None, None, None), slice(None, None, None)))))), (slice(0, 1000, None), slice(0, 3, None), slice(0, 250, None))) kwargs: {} Exception: OSError(-101, 'NetCDF: HDF error')

distributed.worker - WARNING - Compute Failed Function: getter args: (ImplicitToExplicitIndexingAdapter(array=CopyOnWriteArray(array=LazilyOuterIndexedArray(array=<xarray.backends.netCDF4_.NetCDF4ArrayWrapper object at 0x000001BB5FCB8E10>, key=BasicIndexer((slice(None, None, None), slice(None, None, None), slice(None, None, None)))))), (slice(0, 1000, None), slice(0, 3, None), slice(0, 250, None))) kwargs: {} Exception: OSError(-101, 'NetCDF: HDF error')

distributed.worker - WARNING - Compute Failed Function: getter args: (ImplicitToExplicitIndexingAdapter(array=CopyOnWriteArray(array=LazilyOuterIndexedArray(array=<xarray.backends.netCDF4_.NetCDF4ArrayWrapper object at 0x000001BB5FC9D090>, key=BasicIndexer((slice(None, None, None), slice(None, None, None), slice(None, None, None)))))), (slice(0, 1000, None), slice(0, 3, None), slice(0, 250, None))) kwargs: {} Exception: OSError(-101, 'NetCDF: HDF error')

distributed.worker - WARNING - Compute Failed Function: getter args: (ImplicitToExplicitIndexingAdapter(array=CopyOnWriteArray(array=LazilyOuterIndexedArray(array=<xarray.backends.netCDF4_.NetCDF4ArrayWrapper object at 0x000001BB5FC9D2D0>, key=BasicIndexer((slice(None, None, None), slice(None, None, None), slice(None, None, None)))))), (slice(0, 1000, None), slice(0, 3, None), slice(0, 250, None))) kwargs: {} Exception: OSError(-101, 'NetCDF: HDF error')

distributed.worker - WARNING - Compute Failed Function: getter args: (ImplicitToExplicitIndexingAdapter(array=CopyOnWriteArray(array=LazilyOuterIndexedArray(array=<xarray.backends.netCDF4_.NetCDF4ArrayWrapper object at 0x000001BB5FC9D510>, key=BasicIndexer((slice(None, None, None), slice(None, None, None), slice(None, None, None)))))), (slice(0, 1000, None), slice(0, 3, None), slice(0, 250, None))) kwargs: {} Exception: OSError(-101, 'NetCDF: HDF error')

distributed.worker - WARNING - Compute Failed Function: getter args: (ImplicitToExplicitIndexingAdapter(array=CopyOnWriteArray(array=LazilyOuterIndexedArray(array=<xarray.backends.netCDF4_.NetCDF4ArrayWrapper object at 0x000001BB5FC9D750>, key=BasicIndexer((slice(None, None, None), slice(None, None, None), slice(None, None, None)))))), (slice(0, 1000, None), slice(0, 3, None), slice(0, 250, None))) kwargs: {} Exception: OSError(-101, 'NetCDF: HDF error')

distributed.worker - WARNING - Compute Failed Function: getter args: (ImplicitToExplicitIndexingAdapter(array=CopyOnWriteArray(array=LazilyOuterIndexedArray(array=<xarray.backends.netCDF4_.NetCDF4ArrayWrapper object at 0x000001BB5FC9DC18>, key=BasicIndexer((slice(None, None, None), slice(None, None, None), slice(None, None, None)))))), (slice(0, 1000, None), slice(0, 3, None), slice(0, 250, None))) kwargs: {} Exception: OSError(-101, 'NetCDF: HDF error')

distributed.worker - WARNING - Compute Failed Function: getter args: (ImplicitToExplicitIndexingAdapter(array=CopyOnWriteArray(array=LazilyOuterIndexedArray(array=<xarray.backends.netCDF4_.NetCDF4ArrayWrapper object at 0x000001BB5FC9DBD0>, key=BasicIndexer((slice(None, None, None), slice(None, None, None), slice(None, None, None)))))), (slice(0, 1000, None), slice(0, 3, None), slice(0, 250, None))) kwargs: {} Exception: OSError(-101, 'NetCDF: HDF error')

distributed.worker - WARNING - Compute Failed Function: getter args: (ImplicitToExplicitIndexingAdapter(array=CopyOnWriteArray(array=LazilyOuterIndexedArray(array=<xarray.backends.netCDF4_.NetCDF4ArrayWrapper object at 0x000001BB5FC9DCA8>, key=BasicIndexer((slice(None, None, None), slice(None, None, None), slice(None, None, None)))))), (slice(0, 1000, None), slice(0, 3, None), slice(0, 250, None))) kwargs: {} Exception: OSError(-101, 'NetCDF: HDF error') Traceback (most recent call last):

File "<stdin>", line 1, in <module> distributed.worker - WARNING - Compute Failed Function: getter args: (ImplicitToExplicitIndexingAdapter(array=CopyOnWriteArray(array=LazilyOuterIndexedArray(array=<xarray.backends.netCDF4_.NetCDF4ArrayWrapper object at 0x000001BB5FC9DD38>, key=BasicIndexer((slice(None, None, None), slice(None, None, None), slice(None, None, None)))))), (slice(0, 1000, None), slice(0, 3, None), slice(0, 250, None))) kwargs: {} Exception: OSError(-101, 'NetCDF: HDF error')

File "C:\PydroXL_19\envs\dasktest\lib\site-packages\xarray\core\dataarray.py", line 837, in compute return new.load(kwargs) File "C:\PydroXL_19\envs\dasktest\lib\site-packages\xarray\core\dataarray.py", line 811, in load ds = self._to_temp_dataset().load(kwargs) File "C:\PydroXL_19\envs\dasktest\lib\site-packages\xarray\core\dataset.py", line 649, in load evaluated_data = da.compute(lazy_data.values(), kwargs) File "C:\PydroXL_19\envs\dasktest\lib\site-packages\dask\base.py", line 436, in compute results = schedule(dsk, keys, kwargs) File "C:\PydroXL_19\envs\dasktest\lib\site-packages\distributed\client.py", line 2545, in get results = self.gather(packed, asynchronous=asynchronous, direct=direct) File "C:\PydroXL_19\envs\dasktest\lib\site-packages\distributed\client.py", line 1845, in gather asynchronous=asynchronous, File "C:\PydroXL_19\envs\dasktest\lib\site-packages\distributed\client.py", line 762, in sync self.loop, func, args, callback_timeout=callback_timeout, kwargs File "C:\PydroXL_19\envs\dasktest\lib\site-packages\distributed\utils.py", line 333, in sync raise exc.with_traceback(tb) File "C:\PydroXL_19\envs\dasktest\lib\site-packages\distributed\utils.py", line 317, in f result[0] = yield future File "C:\PydroXL_19\envs\dasktest\lib\site-packages\tornado\gen.py", line 735, in run value = future.result() File "C:\PydroXL_19\envs\dasktest\lib\site-packages\distributed\client.py", line 1701, in gather raise exception.with_traceback(traceback) File "C:\PydroXL_19\envs\dasktest\lib\site-packages\dask\array\core.py", line 106, in getter c = np.asarray(c) File "C:\PydroXL_19\envs\dasktest\lib\site-packages\numpy\core_asarray.py", line 85, in asarray return array(a, dtype, copy=False, order=order) File "C:\PydroXL_19\envs\dasktest\lib\site-packages\xarray\core\indexing.py", line 481, in array return np.asarray(self.array, dtype=dtype) File "C:\PydroXL_19\envs\dasktest\lib\site-packages\numpy\core_asarray.py", line 85, in asarray return array(a, dtype, copy=False, order=order) File "C:\PydroXL_19\envs\dasktest\lib\site-packages\xarray\core\indexing.py", line 643, in array return np.asarray(self.array, dtype=dtype) File "C:\PydroXL_19\envs\dasktest\lib\site-packages\numpy\core_asarray.py", line 85, in asarray return array(a, dtype, copy=False, order=order) File "C:\PydroXL_19\envs\dasktest\lib\site-packages\xarray\core\indexing.py", line 547, in array return np.asarray(array[self.key], dtype=None) File "C:\PydroXL_19\envs\dasktest\lib\site-packages\xarray\backends\netCDF4.py", line 72, in getitem key, self.shape, indexing.IndexingSupport.OUTER, self.getitem File "C:\PydroXL_19\envs\dasktest\lib\site-packages\xarray\core\indexing.py", line 827, in explicit_indexing_adapter result = raw_indexing_method(raw_key.tuple) File "C:\PydroXL_19\envs\dasktest\lib\site-packages\xarray\backends\netCDF4.py", line 83, in getitem original_array = self.get_array(needs_lock=False) File "C:\PydroXL_19\envs\dasktest\lib\site-packages\xarray\backends\netCDF4.py", line 62, in get_array ds = self.datastore.acquire(needs_lock) File "C:\PydroXL_19\envs\dasktest\lib\site-packages\xarray\backends\netCDF4.py", line 360, in _acquire with self._manager.acquire_context(needs_lock) as root: File "C:\PydroXL_19\envs\dasktest\lib\contextlib.py", line 81, in enter return next(self.gen) File "C:\PydroXL_19\envs\dasktest\lib\site-packages\xarray\backends\file_manager.py", line 186, in acquire_context file, cached = self._acquire_with_cache_info(needs_lock) File "C:\PydroXL_19\envs\dasktest\lib\site-packages\xarray\backends\file_manager.py", line 204, in _acquire_with_cache_info file = self._opener(*self._args, kwargs) File "netCDF4_netCDF4.pyx", line 2321, in netCDF4._netCDF4.Dataset.init File "netCDF4_netCDF4.pyx", line 1885, in netCDF4._netCDF4._ensure_nc_success OSError: [Errno -101] NetCDF: HDF error: b'D:\dasktest\data_dir\EM2040\converted\test\test4.nc' ```

{
    "url": "https://api.github.com/repos/pydata/xarray/issues/3684/reactions",
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
    xarray 13221727 issue
1244030662 I_kwDOAMm_X85KJmbG 6625 Why am I getting 'Passing method to Float64Index.get_loc is deprecated' error when using the .sel method to extract some data, and how do I solve it? ghost 10137 closed 0     5 2022-05-21T16:40:53Z 2022-09-26T08:47:03Z 2022-07-09T00:41:53Z NONE      

What is your issue?

climateModels['CSIRO-QCCCE-CSIRO-Mk3-6']['RCP 45'][2]['tasmax'].sel(lon = 74, lat= 31, time = '2041-06-16', method='nearest').data[0]

`\anaconda3\lib\site-packages\xarray\core\indexes.py:234: FutureWarning: Passing method to Float64Index.get_loc is deprecated and will raise in a future version. Use index.get_indexer([item], method=...) instead.

I don't know much about how to solve this issue, can anyone help me out please?

{
    "url": "https://api.github.com/repos/pydata/xarray/issues/6625/reactions",
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  completed xarray 13221727 issue
1299316581 I_kwDOAMm_X85Ncf9l 6766 xr.open_dataset(url) gives NetCDF4 (lru_cache.py) error "oc_open: Could not read url" ghost 10137 closed 0     8 2022-07-08T18:15:18Z 2022-07-11T14:49:10Z 2022-07-11T14:49:09Z NONE      

What is your issue?

This code I use was working about a year ago but today gives me an error:

import xarray as xr url = 'http://psl.noaa.gov/thredds/dodsC/Datasets/NARR/monolevel/uwnd.10m.2000.nc' ds = xr. open_dataset(url) The Traceback includes the following: File "C:\Users\Codiga_D\AppData\Local\Continuum\miniconda3\envs\EQ\lib\site-packages\xarray\backends\lru_cache.py", line 53, in __getitem__ value = self._cache[key] and ``` OSError: [Errno -68] NetCDF: I/O failure: b'http://psl.noaa.gov/thredds/dodsC/Datasets/NARR/monolevel/uwnd.10m.2000.nc'

Note:Caching=1 Error:curl error: SSL connect error curl error details: Warning:oc_open: Could not read url ``` I have confirmed that the file I am trying to read is on the server and the server is not requiring a password (nothing I am aware of, about the server, has changed since my code used to work successfully).

I am on Windows using a conda virtual env (no pip). My xarray is 0.20.2 and my netCDF4 is 1.6.0-- these are almost certainly more recent than the ones I was using when my code used to succeed, but I didn't record which version(s) used to work.

It was suggested that I pin netcdf4 to 1.5.8, so I tried this but got the same error.

Recently I had to update security certificates locally here, and this could be related, but I'm not sure.

Any suggestions for how I should troubleshoot this?

Also, should I post an issue at https://github.com/Unidata/netcdf4-python instead of, and/or in addition to, this one?

I found these issues, which seem possibly related, but don't seem to be resolved well yet: https://github.com/Unidata/netcdf4-python/issues/755 https://github.com/pydata/xarray/issues/4925

(I also opened 'discussion' #6742 but so far there has been little response there.)

{
    "url": "https://api.github.com/repos/pydata/xarray/issues/6766/reactions",
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  completed xarray 13221727 issue
270677100 MDExOlB1bGxSZXF1ZXN0MTUwMzA4NTg0 1682 Add option “engine” ghost 10137 closed 0     11 2017-11-02T14:38:07Z 2022-04-15T02:01:28Z 2022-04-15T02:01:28Z NONE   0 pydata/xarray/pulls/1682

Implements a new xarray option engine for setting the default backend data read/write engine. Inspired by this Stack Overflow answer.

This PR is not ready for merge yet but I wanted to verify if the code changes are on the right track.

The default engine option value is None. If this option is set the _get_default_engine() function will return its value without going through the import statements chain.

  • [ ] Closes #xxxx
  • [ ] Tests added / passed
  • [ ] Passes git diff upstream/master **/*py | flake8 --diff
  • [ ] Fully documented, including whats-new.rst for all changes and api.rst for new API
{
    "url": "https://api.github.com/repos/pydata/xarray/issues/1682/reactions",
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
    xarray 13221727 pull
910844095 MDU6SXNzdWU5MTA4NDQwOTU= 5434 xarray.open_rasterio ghost 10137 closed 0     2 2021-06-03T20:51:38Z 2022-04-09T01:31:26Z 2022-04-09T01:31:26Z NONE      

Could you please change xarray.open_rasterio from experimental to stable with more faster capability of reading geotiff files (if possible)? For original array indexing capabilities, I would like to stick in xarray than rioxarray. With much respected. Thank you.

{
    "url": "https://api.github.com/repos/pydata/xarray/issues/5434/reactions",
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  completed xarray 13221727 issue
853260893 MDExOlB1bGxSZXF1ZXN0NjExMzc3OTQ0 5131 Remove trailing space from DatasetGroupBy repr ghost 10137 closed 0     1 2021-04-08T09:19:30Z 2021-04-08T14:49:15Z 2021-04-08T14:49:15Z NONE   0 pydata/xarray/pulls/5131

Remove trailing whitespace from DatasetGroupBy representation because flake8 reports it as a violation when present in doctests.

Fix #5130

{
    "url": "https://api.github.com/repos/pydata/xarray/issues/5131/reactions",
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
    xarray 13221727 pull
853168658 MDU6SXNzdWU4NTMxNjg2NTg= 5130 Trailing whitespace in DatasetGroupBy text representation ghost 10137 closed 0     1 2021-04-08T07:39:08Z 2021-04-08T14:49:14Z 2021-04-08T14:49:14Z NONE      

When displaying a DatasetGroupBy in an interactive Python session, the first line of output contains a trailing whitespace. The first example in the documentation demonstrate this:

```pycon

import xarray as xr, numpy as np ds = xr.Dataset( ... {"foo": (("x", "y"), np.random.rand(4, 3))}, ... coords={"x": [10, 20, 30, 40], "letters": ("x", list("abba"))}, ... ) ds.groupby("letters") DatasetGroupBy, grouped over 'letters' 2 groups with labels 'a', 'b'. ```

There is a trailing whitespace in the first line of output which is "DatasetGroupBy, grouped over 'letters' ". This can be seen more clearly by converting the object to a string (note the whitespace before \n):

```pycon

str(ds.groupby("letters")) "DatasetGroupBy, grouped over 'letters' \n2 groups with labels 'a', 'b'." ```

While this isn't a problem in itself, it causes an issue for us because we use flake8 in continuous integration to verify that our code is correctly formatted and we also have doctests that rely on DatasetGroupBy textual representation. Flake8 reports a violation on the trailing whitespaces in our docstrings. If we remove the trailing whitespaces, our doctests fail because the expected output doesn't match the actual output. So we have conflicting constraints coming from our tools which both seem reasonable. Trailing whitespaces are forbidden by flake8 because, among other reasons, they lead to noisy git diffs. Doctest want the expected output to be exactly the same as the actual output and considers a trailing whitespace to be a significant difference. We could configure flake8 to ignore this particular violation for the files in which we have these doctests, but this may cause other trailing whitespaces to creep in our code, which we don't want. Unfortunately it's not possible to just add # NoQA comments to get flake8 to ignore the violation only for specific lines because that creates a difference between expected and actual output from doctest point of view. Flake8 doesn't allow to disable checks for blocks of code either.

Is there a reason for having this trailing whitespace in DatasetGroupBy representation? Whould it be OK to remove it? If so please let me know and I can make a pull request.

{
    "url": "https://api.github.com/repos/pydata/xarray/issues/5130/reactions",
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  completed xarray 13221727 issue
323703742 MDU6SXNzdWUzMjM3MDM3NDI= 2139 From pandas to xarray without blowing up memory ghost 10137 closed 0     15 2018-05-16T16:51:09Z 2020-10-14T19:34:54Z 2019-08-27T08:54:26Z NONE      

I have a billion rows of data, but really it's just two categorical variables, time, lat, lon and some data variables.

Thinking it would somehow help me get the data into xarray, I created a five level pandas MultiIndex array out of the data, but thus far this has not been successful. xarray tries to create a product and that's just not going to work..

Trying to write a NetCDF file has presented its own issues, and I'm left wondering if there isn't a much simpler way to go about this?

{
    "url": "https://api.github.com/repos/pydata/xarray/issues/2139/reactions",
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  completed xarray 13221727 issue
454073421 MDU6SXNzdWU0NTQwNzM0MjE= 3007 NaN values for variables when converting from a pandas dataframe to xarray.DataSet ghost 10137 closed 0     5 2019-06-10T09:15:21Z 2020-03-23T13:15:16Z 2020-03-23T13:15:15Z NONE      

Code Sample, a copy-pastable example if possible

```python wind_surface hurs bui fwi lat lon time
34.511383 16.467664 1971-01-10 12:00:00 29.658546 70.481293 ... 8.134300 7.409146 34.515558 16.723973 1971-01-10 12:00:00 30.896049 71.356644 ... 8.874528 8.399877 34.517359 16.852138 1971-01-10 12:00:00 31.514799 71.708603 ... 8.789351 8.763743 34.518970 16.980310 1971-01-10 12:00:00 32.105423 72.023773 ... 8.962551 9.125644 34.520391 17.108487 1971-01-10 12:00:00 32.724174 72.106110 ... 8.725038 9.249104

[5 rows x 10 columns]

In [81]: df.to_xarray()
Out[81]: <xarray.Dataset> Dimensions: (lat: 5, lon: 5, time: 1) Coordinates: * lat (lat) float64 34.51 34.52 34.52 34.52 34.52 * lon (lon) float64 16.47 16.72 16.85 16.98 17.11 * time (time) object '1971-01-10 12:00:00' Data variables: wind_surface (lat, lon, time) float64 29.658546 nan nan ... nan 32.724174 hurs (lat, lon, time) float64 70.48129 nan nan ... nan nan 72.10611 precip (lat, lon, time) float64 0.0 nan nan nan ... nan nan nan 0.0 tmax (lat, lon, time) float64 16.060822 nan nan ... nan 16.185822 ffmc (lat, lon, time) float64 83.58528 nan nan ... nan nan 84.05673 isi (lat, lon, time) float64 7.7641253 nan nan ... nan nan 9.64494 dmc (lat, lon, time) float64 6.797345 nan nan ... nan nan 7.90833 dc (lat, lon, time) float64 25.314878 nan nan ... nan 24.324644 bui (lat, lon, time) float64 8.1343 nan nan ... nan nan 8.725038 fwi (lat, lon, time) float64 7.409146 nan nan ... nan 9.2491045 ```

Problem description

Hi, I get those nan values for variables when I try to convert from a pandas.DataFrame with MultiIndex to a xarray.DataArray. The same happend if I try to build a xarray.Dataset and then unstack the multiindex as shown below:

python ds = xr.Dataset(df) ds.unstack('dim_0') <xarray.Dataset> Dimensions: (lat: 5, lon: 5, time: 1) Coordinates: * lat (lat) float64 34.51 34.52 34.52 34.52 34.52 * lon (lon) float64 16.47 16.72 16.85 16.98 17.11 * time (time) object '1971-01-10 12:00:00' Data variables: wind_surface (lat, lon, time) float32 29.658546 nan nan ... nan 32.724174 hurs (lat, lon, time) float32 70.48129 nan nan ... nan nan 72.10611 precip (lat, lon, time) float32 0.0 nan nan nan ... nan nan nan 0.0 Maybe it's not an issue. I don't know. I'm lost. Any help is welcome.

Regards

Output of xr.show_versions()

# Paste the output here xr.show_versions() here INSTALLED VERSIONS ------------------ commit: None python: 3.7.3 (default, May 9 2019, 11:55:04) [GCC 8.3.0] python-bits: 64 OS: Linux OS-release: 5.0.0-16-generic machine: x86_64 processor: x86_64 byteorder: little LC_ALL: None LANG: en_US.UTF-8 LOCALE: en_US.UTF-8 libhdf5: 1.10.4 libnetcdf: 4.6.2 xarray: 0.12.1 pandas: 0.24.2 numpy: 1.16.3 scipy: 1.3.0 netCDF4: 1.5.2 pydap: installed h5netcdf: 0.7.3 h5py: 2.9.0 Nio: None zarr: 2.3.1 cftime: 1.0.1 nc_time_axis: 1.1.0 PseudonetCDF: None rasterio: 1.0.23 cfgrib: None iris: 2.3.0dev0 bottleneck: 1.2.1 dask: 1.2.2 distributed: None matplotlib: 3.1.0 cartopy: 0.17.1.dev168+ seaborn: 0.9.0 setuptools: 40.8.0 pip: 19.1.1 conda: None pytest: None IPython: 7.5.0 sphinx: 2.0.1
{
    "url": "https://api.github.com/repos/pydata/xarray/issues/3007/reactions",
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  completed xarray 13221727 issue
270701183 MDExOlB1bGxSZXF1ZXN0MTUwMzI2NzMw 1683 Add h5netcdf to the engine import hierarchy ghost 10137 closed 0     2 2017-11-02T15:39:35Z 2018-06-05T05:16:40Z 2018-02-12T16:06:44Z NONE   0 pydata/xarray/pulls/1683

h5netcdf is now part of the import statements in the _get_default_engine() function. The order is: netcdf4, scipy.io.netcdf, h5netcdf.

  • [ ] Closes #xxxx
  • [ ] Tests added / passed
  • [ ] Passes git diff upstream/master **/*py | flake8 --diff
  • [ ] Fully documented, including whats-new.rst for all changes and api.rst for new API
{
    "url": "https://api.github.com/repos/pydata/xarray/issues/1683/reactions",
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
    xarray 13221727 pull
291926319 MDU6SXNzdWUyOTE5MjYzMTk= 1860 IndexError when accesing a data variable through a PydapDataStore ghost 10137 closed 0     4 2018-01-26T14:58:14Z 2018-01-27T08:41:21Z 2018-01-27T08:41:21Z NONE      

Code Sample, a copy-pastable example if possible

python import xarray as xa from pydap.cas.urs import setup_session url = 'https://goldsmr4.gesdisc.eosdis.nasa.gov/dods/M2T1NXFLX' session = setup_session(username='****', password='****', check_url=url) store = xa.backends.PydapDataStore.open(url, session=session) ds = xa.open_dataset(store) tlml = ds['tlml'] print(tlml[0,0,0])

Problem description

I was trying to connect to NASA MERRA-2 data through OPeNDAP, following the documentation here: http://xarray.pydata.org/en/stable/io.html#OPeNDAP. Opening the dataset works fine, but trying to access a data variable throws a strange IndexError. Traceback below: ``` Traceback (most recent call last):

File "C:\Anaconda3\envs\jaws\lib\site-packages\IPython\core\formatters.py", line 702, in call printer.pretty(obj) File "C:\Anaconda3\envs\jaws\lib\site-packages\IPython\lib\pretty.py", line 395, in pretty return default_pprint(obj, self, cycle) File "C:\Anaconda3\envs\jaws\lib\site-packages\IPython\lib\pretty.py", line 510, in _default_pprint _repr_pprint(obj, p, cycle) File "C:\Anaconda3\envs\jaws\lib\site-packages\IPython\lib\pretty.py", line 701, in _repr_pprint output = repr(obj) File "c:\src\xarray\xarray\core\common.py", line 100, in __repr__ return formatting.array_repr(self) File "c:\src\xarray\xarray\core\formatting.py", line 393, in array_repr summary.append(short_array_repr(arr.values)) File "c:\src\xarray\xarray\core\dataarray.py", line 411, in values return self.variable.values File "c:\src\xarray\xarray\core\variable.py", line 392, in values return _as_array_or_item(self._data) File "c:\src\xarray\xarray\core\variable.py", line 216, in _as_array_or_item data = np.asarray(data) File "C:\Anaconda3\envs\jaws\lib\site-packages\numpy\core\numeric.py", line 492, in asarray return array(a, dtype, copy=False, order=order) File "c:\src\xarray\xarray\core\indexing.py", line 572, in __array__ self._ensure_cached() File "c:\src\xarray\xarray\core\indexing.py", line 569, in _ensure_cached self.array = NumpyIndexingAdapter(np.asarray(self.array)) File "C:\Anaconda3\envs\jaws\lib\site-packages\numpy\core\numeric.py", line 492, in asarray return array(a, dtype, copy=False, order=order) File "c:\src\xarray\xarray\core\indexing.py", line 553, in __array__ return np.asarray(self.array, dtype=dtype) File "C:\Anaconda3\envs\jaws\lib\site-packages\numpy\core\numeric.py", line 492, in asarray return array(a, dtype, copy=False, order=order) File "c:\src\xarray\xarray\core\indexing.py", line 520, in __array__ return np.asarray(array[self.key], dtype=None) File "c:\src\xarray\xarray\conventions.py", line 134, in __getitem__ return np.asarray(self.array[key], dtype=self.dtype) File "c:\src\xarray\xarray\coding\variables.py", line 71, in __getitem__ return self.func(self.array[key]) File "c:\src\xarray\xarray\coding\variables.py", line 140, in _apply_mask data = np.asarray(data, dtype=dtype) File "C:\Anaconda3\envs\jaws\lib\site-packages\numpy\core\numeric.py", line 492, in asarray return array(a, dtype, copy=False, order=order) File "c:\src\xarray\xarray\core\indexing.py", line 520, in __array__ return np.asarray(array[self.key], dtype=None) File "c:\src\xarray\xarray\backends\pydap.py", line 33, in getitem result = robust_getitem(array, key, catch=ValueError) File "c:\src\xarray\xarray\backends\common.py", line 67, in robust_getitem return array[key] File "C:\src\pydap\src\pydap\model.py", line 320, in getitem out.data = self._get_data_index(index) File "C:\src\pydap\src\pydap\model.py", line 350, in _get_data_index return self._data[index] File "C:\src\pydap\src\pydap\handlers\dap.py", line 149, in getitem return dataset[self.id].data File "C:\src\pydap\src\pydap\model.py", line 426, in getitem return self._getitem_string(key) File "C:\src\pydap\src\pydap\model.py", line 410, in _getitem_string return self[splitted[0]]['.'.join(splitted[1:])] File "C:\src\pydap\src\pydap\model.py", line 320, in getitem out.data = self._get_data_index(index) File "C:\src\pydap\src\pydap\model.py", line 350, in _get_data_index return self._data[index] IndexError: only integers, slices (:), ellipsis (...), numpy.newaxis (None) and integer or boolean arrays are valid indices ```

Expected Output

Expecting to see the value of variable 'tlml' at these coordinates.

Output of xr.show_versions()

INSTALLED VERSIONS
------------------
commit: None
python: 3.6.4.final.0
python-bits: 64
OS: Windows
OS-release: 7
machine: AMD64
processor: Intel64 Family 6 Model 61 Stepping 4, GenuineIntel
byteorder: little
LC_ALL: None
LANG: en
LOCALE: None.None

xarray: 0.10.0+dev44.g0a0593d
pandas: 0.22.0
numpy: 1.14.0
scipy: 1.0.0
netCDF4: 1.3.1
h5netcdf: 0.5.0
Nio: None
zarr: None
bottleneck: 1.2.1
cyordereddict: None
dask: 0.16.1
distributed: 1.20.2
matplotlib: None
cartopy: None
seaborn: None
setuptools: 38.4.0
pip: 9.0.1
conda: None
pytest: None
IPython: 6.2.1
sphinx: 1.6.6

{
    "url": "https://api.github.com/repos/pydata/xarray/issues/1860/reactions",
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  completed xarray 13221727 issue
291524555 MDU6SXNzdWUyOTE1MjQ1NTU= 1857 AttributeError: '<class 'pydap.model.GridType'>' object has no attribute 'shape' ghost 10137 closed 0     6 2018-01-25T10:42:20Z 2018-01-26T13:25:07Z 2018-01-26T13:25:07Z NONE      

Code Sample, a copy-pastable example if possible

python import xarray as xa from pydap.cas.urs import setup_session url = 'https://goldsmr4.gesdisc.eosdis.nasa.gov/dods/M2T1NXFLX' session = setup_session(username='****', password='****', check_url=url) store = xa.backends.PydapDataStore.open(url, session=session) ds = xa.open_dataset(store)

Problem description

I was trying to connect to NASA MERRA-2 data through OPeNDAP, following the documentation here: http://xarray.pydata.org/en/stable/io.html#OPeNDAP. I was able to get through a previous bug (#1775) by installing the latest master version of xarray.

Expected Output

Expecting the collection (M2T1NXFLX) content to show up as an xarray dataset.

Output of xr.show_versions()

INSTALLED VERSIONS
------------------
commit: None
python: 3.6.4.final.0
python-bits: 64
OS: Windows
OS-release: 7
machine: AMD64
processor: Intel64 Family 6 Model 61 Stepping 4, GenuineIntel
byteorder: little
LC_ALL: None
LANG: None
LOCALE: None.None

xarray: 0.10.0+dev44.g0a0593d
pandas: 0.22.0
numpy: 1.14.0
scipy: 1.0.0
netCDF4: None
h5netcdf: None
Nio: None
zarr: None
bottleneck: None
cyordereddict: None
dask: None
distributed: None
matplotlib: None
cartopy: None
seaborn: None
setuptools: 38.4.0
pip: 9.0.1
conda: None
pytest: None
IPython: None
sphinx: None
{
    "url": "https://api.github.com/repos/pydata/xarray/issues/1857/reactions",
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  completed xarray 13221727 issue
245186903 MDU6SXNzdWUyNDUxODY5MDM= 1486 boolean indexing ghost 10137 closed 0     2 2017-07-24T19:39:42Z 2017-09-07T08:06:49Z 2017-09-07T08:06:48Z NONE      

I am trying to figure out how boolean indexing works in xarray. I have a couple of data arrays below: X_day[latitude,longitude,time] X_night[latitude,longitude,time] Rule[latitude,longitude]

I want to merge X_day and X_night as a new X based on Rule. First I make a copy of X_day to be X: X = X_day Then I tried: X[Rule==True, :] = X_night and this: X.values[Rule==True, :] = X_night.values and also this: X.where(Rule==True) = X_night None of the above assignment worked. Please help.

{
    "url": "https://api.github.com/repos/pydata/xarray/issues/1486/reactions",
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  completed xarray 13221727 issue
244702576 MDU6SXNzdWUyNDQ3MDI1NzY= 1484 Matrix cross product in xarray ghost 10137 closed 0     4 2017-07-21T15:21:37Z 2017-07-24T16:36:22Z 2017-07-24T16:36:22Z NONE      

Hi I am new to xarray and need some advice on one task. I need to do a cross product calculation using variables from a netcdf file and write the output to a new netcdf file. I feel this could be done using netcdf-python and pandas, but I hope to use xarray to simplify the task. My code will be something like this: ds = xr.open_dataset(NC_FILE) var1 = ds['VAR1'] var2 = ds['VAR2'] var3 = ds['VAR3'] var4 = ds['VAR4']

var1-4 above will have dimensions [latitude, longitude]. I will use var1-4 to generate a matrix of dimensions [Nlat, Nlon, M], something like: [var1, var1-var2, var1-var3, (var1-var2)*np.cos(var4)] (here, M=4).

My question here is, how do I build this matrix in xarray Dataset? Since this matrix will be eventually used to cross product with another matrix (pd.DataFrame) of dimensions [M, K], is it better to convert var1-4 to pd.DataFrame first?

Following code will be like this: matrix = [var1, var1-var2, var1-var3, (var1-var2)*np.cos(var4)] # Nlat x Nlon x M factor = pd.read_csv(somefile) # M x K result = pd.DataFrame.dot(matrix,factor) # Nlat x Nlon x K result2 = xr.Dataset(result) result2.to_netcdf(outfile)

Can someone show me the correct code to build the Nlat x Nlon x M matrix? Can the cross product be done in xr.Dataset to avoid conversion to and from pd.DataFrame?

Thank you, Xin

{
    "url": "https://api.github.com/repos/pydata/xarray/issues/1484/reactions",
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  completed xarray 13221727 issue
98442885 MDU6SXNzdWU5ODQ0Mjg4NQ== 505 Resampling drops datavars with unsigned integer datatypes ghost 10137 closed 0     1 2015-07-31T18:04:51Z 2015-07-31T19:44:32Z 2015-07-31T19:44:32Z NONE      

If a variable has an unsigned integer type (uint16, uint32, etc.), resampling time will drop that variable. Does not occur with signed integer types (int16, etc.).

``` import numpy as np import pandas as pd import xray

numbers = np.arange(1, 6).astype('uint32') ds = xray.Dataset( {'numbers': ('time', numbers)}, coords = {'time': pd.date_range('2000-01-01', periods=5)}) resampled = ds.resample('24H', dim='time') assert 'numbers' not in resampled ```

{
    "url": "https://api.github.com/repos/pydata/xarray/issues/505/reactions",
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  completed xarray 13221727 issue
91676831 MDU6SXNzdWU5MTY3NjgzMQ== 448 asarray Compatibility ghost 10137 closed 0     3 2015-06-29T02:45:25Z 2015-06-30T23:02:57Z 2015-06-30T23:02:57Z NONE      

To "numpify" a function, usually asarray is used:

def db2w(arr): return 10 ** (np.asarray(arr) / 20.0)

Now you could replace the divide with np.divide, but it seems much simpler to use np.asarray. Unfortunately, if you use any function that has been "vectorized," it will only return the values of the DataArray as an ndarray. This strips the object of any "xray" meta-data and severely limits the use of this class. It requires that any function that wants to work seamlessly with a DataArray explicitly check that it's an instance of DataArray

This seems counter-intuitive to the numpy framework where any function, once properly vectorized, can work with python scalars (int, float) or list types (tuple, list) as well as the actual ndarray class. It would be awesome if code that worked for these cases could just work for DataArrays.

{
    "url": "https://api.github.com/repos/pydata/xarray/issues/448/reactions",
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  completed xarray 13221727 issue

Advanced export

JSON shape: default, array, newline-delimited, object

CSV options:

CREATE TABLE [issues] (
   [id] INTEGER PRIMARY KEY,
   [node_id] TEXT,
   [number] INTEGER,
   [title] TEXT,
   [user] INTEGER REFERENCES [users]([id]),
   [state] TEXT,
   [locked] INTEGER,
   [assignee] INTEGER REFERENCES [users]([id]),
   [milestone] INTEGER REFERENCES [milestones]([id]),
   [comments] INTEGER,
   [created_at] TEXT,
   [updated_at] TEXT,
   [closed_at] TEXT,
   [author_association] TEXT,
   [active_lock_reason] TEXT,
   [draft] INTEGER,
   [pull_request] TEXT,
   [body] TEXT,
   [reactions] TEXT,
   [performed_via_github_app] TEXT,
   [state_reason] TEXT,
   [repo] INTEGER REFERENCES [repos]([id]),
   [type] TEXT
);
CREATE INDEX [idx_issues_repo]
    ON [issues] ([repo]);
CREATE INDEX [idx_issues_milestone]
    ON [issues] ([milestone]);
CREATE INDEX [idx_issues_assignee]
    ON [issues] ([assignee]);
CREATE INDEX [idx_issues_user]
    ON [issues] ([user]);
Powered by Datasette · Queries took 2405.187ms · About: xarray-datasette