id,node_id,number,title,user,state,locked,assignee,milestone,comments,created_at,updated_at,closed_at,author_association,active_lock_reason,draft,pull_request,body,reactions,performed_via_github_app,state_reason,repo,type 1605108888,I_kwDOAMm_X85frASY,7574,xr.open_mfdataset doesn't work with fsspec and dask,10137,closed,0,,,12,2023-03-01T14:45:56Z,2023-09-08T00:33:41Z,2023-09-08T00:33:41Z,NONE,,,,"### What happened? I was trying to read multiple byte netcdf (requires h5netcdf engine) file with xr.open_mfdataset with parallel=True to leverage dask.delayed capabilities (parallel=False works though) but it failed. The netcdf files were noaa-goes16 satellite images, but I can't tell if it matters. ### What did you expect to happen? It should have loaded all the netcdf files into a xarray.DataSet object ### Minimal Complete Verifiable Example ```python import fsspec import xarray as xr paths = [ 's3://noaa-goes16/ABI-L2-LSTC/2022/185/03/OR_ABI-L2-LSTC-M6_G16_s20221850301180_e20221850303553_c20221850305091.nc', 's3://noaa-goes16/ABI-L2-LSTC/2022/185/02/OR_ABI-L2-LSTC-M6_G16_s20221850201180_e20221850203553_c20221850205142.nc' ] fs = fsspec.filesystem('s3') xr.open_mfdataset( [fs.open(path, mode=""rb"") for path in paths], engine=""h5netcdf"", combine=""nested"", concat_dim=""t"", parallel=True ) ``` ### MVCE confirmation - [ ] Minimal example — the example is as focused as reasonably possible to demonstrate the underlying issue in xarray. - [X] Complete example — the example is self-contained, including all data and the text of any traceback. - [ ] Verifiable example — the example copy & pastes into an IPython prompt or [Binder notebook](https://mybinder.org/v2/gh/pydata/xarray/main?urlpath=lab/tree/doc/examples/blank_template.ipynb), returning the result. - [X] New issue — a search of GitHub Issues suggests this is not a duplicate. ### Relevant log output ```Python -------------------------------------------------------------------------- KeyError Traceback (most recent call last) File ~/miniconda3/envs/rxr/lib/python3.11/site-packages/xarray/backends/file_manager.py:210, in CachingFileManager._acquire_with_cache_info(self, needs_lock) 209 try: --> 210 file = self._cache[self._key] 211 except KeyError: File ~/miniconda3/envs/rxr/lib/python3.11/site-packages/xarray/backends/lru_cache.py:56, in LRUCache.__getitem__(self, key) 55 with self._lock: ---> 56 value = self._cache[key] 57 self._cache.move_to_end(key) KeyError: [, ((b'\x89HDF\r\n', b'\x1a\n', b'\x02\x08\x08\x00\x00\x00 ... EXTREMELY STRING ... 00\x00\x00\x00\x00\x00\x0ef'] During handling of the above exception, another exception occurred: TypeError Traceback (most recent call last) Cell In[9], line 11 4 paths = [ 5 's3://noaa-goes16/ABI-L2-LSTC/2022/185/03/OR_ABI-L2-LSTC-M6_G16_s20221850301180_e20221850303553_c20221850305091.nc', 6 's3://noaa-goes16/ABI-L2-LSTC/2022/185/02/OR_ABI-L2-LSTC-M6_G16_s20221850201180_e20221850203553_c20221850205142.nc' 7 ] 9 fs = fsspec.filesystem('s3') ---> 11 xr.open_mfdataset( 12 [fs.open(path, mode=""rb"") for path in paths], 13 engine=""h5netcdf"", 14 combine=""nested"", 15 concat_dim=""t"", 16 parallel=True 17 ).LST File ~/miniconda3/envs/rxr/lib/python3.11/site-packages/xarray/backends/api.py:991, in open_mfdataset(paths, chunks, concat_dim, compat, preprocess, engine, data_vars, coords, combine, parallel, join, attrs_file, combine_attrs, **kwargs) 986 datasets = [preprocess(ds) for ds in datasets] 988 if parallel: 989 # calling compute here will return the datasets/file_objs lists, 990 # the underlying datasets will still be stored as dask arrays --> 991 datasets, closers = dask.compute(datasets, closers) 993 # Combine all datasets, closing them in case of a ValueError 994 try: File ~/miniconda3/envs/rxr/lib/python3.11/site-packages/dask/base.py:599, in compute(traverse, optimize_graph, scheduler, get, *args, **kwargs) 596 keys.append(x.__dask_keys__()) 597 postcomputes.append(x.__dask_postcompute__()) --> 599 results = schedule(dsk, keys, **kwargs) 600 return repack([f(r, *a) for r, (f, a) in zip(results, postcomputes)]) File ~/miniconda3/envs/rxr/lib/python3.11/site-packages/dask/threaded.py:89, in get(dsk, keys, cache, num_workers, pool, **kwargs) 86 elif isinstance(pool, multiprocessing.pool.Pool): 87 pool = MultiprocessingPoolExecutor(pool) ---> 89 results = get_async( 90 pool.submit, 91 pool._max_workers, 92 dsk, 93 keys, 94 cache=cache, 95 get_id=_thread_get_id, 96 pack_exception=pack_exception, 97 **kwargs, 98 ) 100 # Cleanup pools associated to dead threads 101 with pools_lock: File ~/miniconda3/envs/rxr/lib/python3.11/site-packages/dask/local.py:511, in get_async(submit, num_workers, dsk, result, cache, get_id, rerun_exceptions_locally, pack_exception, raise_exception, callbacks, dumps, loads, chunksize, **kwargs) 509 _execute_task(task, data) # Re-execute locally 510 else: --> 511 raise_exception(exc, tb) 512 res, worker_id = loads(res_info) 513 state[""cache""][key] = res File ~/miniconda3/envs/rxr/lib/python3.11/site-packages/dask/local.py:319, in reraise(exc, tb) 317 if exc.__traceback__ is not tb: 318 raise exc.with_traceback(tb) --> 319 raise exc File ~/miniconda3/envs/rxr/lib/python3.11/site-packages/dask/local.py:224, in execute_task(key, task_info, dumps, loads, get_id, pack_exception) 222 try: 223 task, data = loads(task_info) --> 224 result = _execute_task(task, data) 225 id = get_id() 226 result = dumps((result, id)) File ~/miniconda3/envs/rxr/lib/python3.11/site-packages/dask/core.py:119, in _execute_task(arg, cache, dsk) 115 func, args = arg[0], arg[1:] 116 # Note: Don't assign the subtask results to a variable. numpy detects 117 # temporaries by their reference count and can execute certain 118 # operations in-place. --> 119 return func(*(_execute_task(a, cache) for a in args)) 120 elif not ishashable(arg): 121 return arg File ~/miniconda3/envs/rxr/lib/python3.11/site-packages/dask/utils.py:73, in apply(func, args, kwargs) 42 """"""Apply a function given its positional and keyword arguments. 43 44 Equivalent to ``func(*args, **kwargs)`` (...) ... ---> 19 filename = fspath(filename) 20 if sys.platform == ""win32"": 21 if isinstance(filename, str): TypeError: expected str, bytes or os.PathLike object, not tuple ``` ### Anything else we need to know? _No response_ ### Environment
INSTALLED VERSIONS ------------------ commit: None python: 3.11.0 | packaged by conda-forge | (main, Jan 15 2023, 05:44:48) [Clang 14.0.6 ] python-bits: 64 OS: Darwin OS-release: 21.6.0 machine: x86_64 processor: i386 byteorder: little LC_ALL: None LANG: None LOCALE: (None, 'UTF-8') libhdf5: 1.12.2 libnetcdf: None xarray: 2023.2.0 pandas: 1.5.3 numpy: 1.24.2 scipy: 1.10.1 netCDF4: None pydap: None h5netcdf: 1.1.0 h5py: 3.8.0 Nio: None zarr: None cftime: None nc_time_axis: None PseudoNetCDF: None rasterio: 1.3.6 cfgrib: None iris: None bottleneck: None dask: 2023.2.1 distributed: 2023.2.1 matplotlib: 3.7.0 cartopy: 0.21.1 seaborn: 0.12.2 numbagg: None fsspec: 2023.1.0 cupy: None pint: None sparse: None flox: None numpy_groupies: None setuptools: 67.4.0 pip: 23.0.1 conda: None pytest: 7.2.1 mypy: None IPython: 8.10.0 sphinx: None [/Users/jo/miniconda3/envs/rxr/lib/python3.11/site-packages/_distutils_hack/__init__.py:33](https://file+.vscode-resource.vscode-cdn.net/Users/jo/miniconda3/envs/rxr/lib/python3.11/site-packages/_distutils_hack/__init__.py:33): UserWarning: Setuptools is replacing distutils. warnings.warn(""Setuptools is replacing distutils."")
","{""url"": ""https://api.github.com/repos/pydata/xarray/issues/7574/reactions"", ""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,completed,13221727,issue 548263148,MDU6SXNzdWU1NDgyNjMxNDg=,3684,open_mfdataset - different behavior with dask.distributed.LocalCluster,10137,open,0,,,3,2020-01-10T19:58:19Z,2023-09-05T10:56:23Z,,NONE,,,,"Big fan of Xarray! Not that familiar with submitting tickets like this, so my apologies for rule breaking. Also, if this belongs over in the dask project, I can move there. dask 2.6.0 numpy 1.17.3 xarray 0.14.1 netCDF4 1.5.3 I am attempting to use open_mfdataset on nc files I've generated through dask/xarray after initializing the dask LocalCluster. I've found that I am able to compute successfully when I don't run the distributed cluster. But if I do, I get a variety of issues. I've got a synthetic data generating example here. Running the soundspeed.compute() will sometimes succeed, and will sometimes cause worker restarts resulting in hdf errors and no return. I was thinking it was something with serialization, i've seen other tickets with similar issues, but I don't see how it applies to my test case. Example code: ```python import numpy as np import xarray as xr import os from dask.distributed import Client cl = Client() outpth = r'D:\dasktest\data_dir\EM2040\converted\test' mint = 0 maxt = 1000 for i in range(100): times = np.arange(mint, maxt) beams = np.arange(250) sectors=['40107_0_260000', '40107_1_320000', '40107_2_290000'] soundspeed = np.random.randn(1000,3,250) ds = xr.Dataset({'soundspeed': (('time','sectors','beams'), soundspeed)}, {'time': times, 'sectors': sectors, 'beams':beams},) ds.to_netcdf(os.path.join(outpth, 'test{}.nc'.format(i)), mode='w') mint = maxt maxt += 1000 fils = [os.path.join(outpth, x) for x in os.listdir(outpth) if os.path.splitext(x)[1] == '.nc'] tst = xr.open_mfdataset(fils, concat_dim='time', combine='nested') tst.soundspeed.compute() ``` I've found that running this example with <10 files reduces the number of errors I'm getting dramatically. I've tried this on different machines in different domain environments just to be sure. I really just want to make sure I'm not making a silly mistake somewhere. Appreciate the help. My last run on actual data: ```python >>> ra.soundspeed.compute() distributed.nanny - WARNING - Restarting worker distributed.nanny - WARNING - Restarting worker distributed.nanny - WARNING - Restarting worker distributed.worker - WARNING - Compute Failed Function: getter args: (ImplicitToExplicitIndexingAdapter(array=CopyOnWriteArray(array=LazilyOuterIndexedArray(array=, key=BasicIndexer((slice(None, None, None), slice(None, None, None)))))), (slice(0, 1719, None), slice(0, 3, None))) kwargs: {} Exception: OSError(-101, 'NetCDF: HDF error') Traceback (most recent call last): File """", line 1, in File ""C:\PydroXL_19\envs\dasktest\lib\site-packages\xarray\core\dataarray.py"", line 837, in compute return new.load(**kwargs) File ""C:\PydroXL_19\envs\dasktest\lib\site-packages\xarray\core\dataarray.py"", line 811, in load ds = self._to_temp_dataset().load(**kwargs) File ""C:\PydroXL_19\envs\dasktest\lib\site-packages\xarray\core\dataset.py"", line 649, in load evaluated_data = da.compute(*lazy_data.values(), **kwargs) File ""C:\PydroXL_19\envs\dasktest\lib\site-packages\dask\base.py"", line 436, in compute results = schedule(dsk, keys, **kwargs) File ""C:\PydroXL_19\envs\dasktest\lib\site-packages\distributed\client.py"", line 2545, in get results = self.gather(packed, asynchronous=asynchronous, direct=direct) File ""C:\PydroXL_19\envs\dasktest\lib\site-packages\distributed\client.py"", line 1845, in gather asynchronous=asynchronous, File ""C:\PydroXL_19\envs\dasktest\lib\site-packages\distributed\client.py"", line 762, in sync self.loop, func, *args, callback_timeout=callback_timeout, **kwargs File ""C:\PydroXL_19\envs\dasktest\lib\site-packages\distributed\utils.py"", line 333, in sync raise exc.with_traceback(tb) File ""C:\PydroXL_19\envs\dasktest\lib\site-packages\distributed\utils.py"", line 317, in f result[0] = yield future File ""C:\PydroXL_19\envs\dasktest\lib\site-packages\tornado\gen.py"", line 735, in run value = future.result() File ""C:\PydroXL_19\envs\dasktest\lib\site-packages\distributed\client.py"", line 1701, in _gather raise exception.with_traceback(traceback) File ""C:\PydroXL_19\envs\dasktest\lib\site-packages\dask\array\core.py"", line 106, in getter c = np.asarray(c) File ""C:\PydroXL_19\envs\dasktest\lib\site-packages\numpy\core\_asarray.py"", line 85, in asarray return array(a, dtype, copy=False, order=order) File ""C:\PydroXL_19\envs\dasktest\lib\site-packages\xarray\core\indexing.py"", line 481, in __array__ return np.asarray(self.array, dtype=dtype) File ""C:\PydroXL_19\envs\dasktest\lib\site-packages\numpy\core\_asarray.py"", line 85, in asarray return array(a, dtype, copy=False, order=order) File ""C:\PydroXL_19\envs\dasktest\lib\site-packages\xarray\core\indexing.py"", line 643, in __array__ return np.asarray(self.array, dtype=dtype) File ""C:\PydroXL_19\envs\dasktest\lib\site-packages\numpy\core\_asarray.py"", line 85, in asarray return array(a, dtype, copy=False, order=order) File ""C:\PydroXL_19\envs\dasktest\lib\site-packages\xarray\core\indexing.py"", line 547, in __array__ return np.asarray(array[self.key], dtype=None) File ""C:\PydroXL_19\envs\dasktest\lib\site-packages\xarray\backends\netCDF4_.py"", line 72, in __getitem__ key, self.shape, indexing.IndexingSupport.OUTER, self._getitem File ""C:\PydroXL_19\envs\dasktest\lib\site-packages\xarray\core\indexing.py"", line 827, in explicit_indexing_adapter result = raw_indexing_method(raw_key.tuple) File ""C:\PydroXL_19\envs\dasktest\lib\site-packages\xarray\backends\netCDF4_.py"", line 83, in _getitem original_array = self.get_array(needs_lock=False) File ""C:\PydroXL_19\envs\dasktest\lib\site-packages\xarray\backends\netCDF4_.py"", line 62, in get_array ds = self.datastore._acquire(needs_lock) File ""C:\PydroXL_19\envs\dasktest\lib\site-packages\xarray\backends\netCDF4_.py"", line 360, in _acquire with self._manager.acquire_context(needs_lock) as root: File ""C:\PydroXL_19\envs\dasktest\lib\contextlib.py"", line 81, in __enter__ return next(self.gen) File ""C:\PydroXL_19\envs\dasktest\lib\site-packages\xarray\backends\file_manager.py"", line 186, in acquire_context file, cached = self._acquire_with_cache_info(needs_lock) File ""C:\PydroXL_19\envs\dasktest\lib\site-packages\xarray\backends\file_manager.py"", line 204, in _acquire_with_cache_info file = self._opener(*self._args, **kwargs) File ""netCDF4\_netCDF4.pyx"", line 2321, in netCDF4._netCDF4.Dataset.__init__ File ""netCDF4\_netCDF4.pyx"", line 1885, in netCDF4._netCDF4._ensure_nc_success OSError: [Errno -101] NetCDF: HDF error: b'D:\\dasktest\\data_dir\\EM2040\\converted\\rangeangle_20.nc' ``` My last run on the synthetic data set generated above: ```python >>> tst.soundspeed.compute() distributed.worker - WARNING - Compute Failed Function: getter args: (ImplicitToExplicitIndexingAdapter(array=CopyOnWriteArray(array=LazilyOuterIndexedArray(array=, key=BasicIndexer((slice(None, None, None), slice(None, None, None), slice(None, None, None)))))), (slice(0, 1000, None), slice(0, 3, None), slice(0, 250, None))) kwargs: {} Exception: OSError(-101, 'NetCDF: HDF error') distributed.worker - WARNING - Compute Failed Function: getter args: (ImplicitToExplicitIndexingAdapter(array=CopyOnWriteArray(array=LazilyOuterIndexedArray(array=, key=BasicIndexer((slice(None, None, None), slice(None, None, None), slice(None, None, None)))))), (slice(0, 1000, None), slice(0, 3, None), slice(0, 250, None))) kwargs: {} Exception: OSError(-101, 'NetCDF: HDF error') distributed.worker - WARNING - Compute Failed Function: getter args: (ImplicitToExplicitIndexingAdapter(array=CopyOnWriteArray(array=LazilyOuterIndexedArray(array=, key=BasicIndexer((slice(None, None, None), slice(None, None, None), slice(None, None, None)))))), (slice(0, 1000, None), slice(0, 3, None), slice(0, 250, None))) kwargs: {} Exception: OSError(-101, 'NetCDF: HDF error') distributed.worker - WARNING - Compute Failed Function: getter args: (ImplicitToExplicitIndexingAdapter(array=CopyOnWriteArray(array=LazilyOuterIndexedArray(array=, key=BasicIndexer((slice(None, None, None), slice(None, None, None), slice(None, None, None)))))), (slice(0, 1000, None), slice(0, 3, None), slice(0, 250, None))) kwargs: {} Exception: OSError(-101, 'NetCDF: HDF error') distributed.worker - WARNING - Compute Failed Function: getter args: (ImplicitToExplicitIndexingAdapter(array=CopyOnWriteArray(array=LazilyOuterIndexedArray(array=, key=BasicIndexer((slice(None, None, None), slice(None, None, None), slice(None, None, None)))))), (slice(0, 1000, None), slice(0, 3, None), slice(0, 250, None))) kwargs: {} Exception: OSError(-101, 'NetCDF: HDF error') distributed.worker - WARNING - Compute Failed Function: getter args: (ImplicitToExplicitIndexingAdapter(array=CopyOnWriteArray(array=LazilyOuterIndexedArray(array=, key=BasicIndexer((slice(None, None, None), slice(None, None, None), slice(None, None, None)))))), (slice(0, 1000, None), slice(0, 3, None), slice(0, 250, None))) kwargs: {} Exception: OSError(-101, 'NetCDF: HDF error') distributed.worker - WARNING - Compute Failed Function: getter args: (ImplicitToExplicitIndexingAdapter(array=CopyOnWriteArray(array=LazilyOuterIndexedArray(array=, key=BasicIndexer((slice(None, None, None), slice(None, None, None), slice(None, None, None)))))), (slice(0, 1000, None), slice(0, 3, None), slice(0, 250, None))) kwargs: {} Exception: OSError(-101, 'NetCDF: HDF error') distributed.worker - WARNING - Compute Failed Function: getter args: (ImplicitToExplicitIndexingAdapter(array=CopyOnWriteArray(array=LazilyOuterIndexedArray(array=, key=BasicIndexer((slice(None, None, None), slice(None, None, None), slice(None, None, None)))))), (slice(0, 1000, None), slice(0, 3, None), slice(0, 250, None))) kwargs: {} Exception: OSError(-101, 'NetCDF: HDF error') distributed.worker - WARNING - Compute Failed Function: getter args: (ImplicitToExplicitIndexingAdapter(array=CopyOnWriteArray(array=LazilyOuterIndexedArray(array=, key=BasicIndexer((slice(None, None, None), slice(None, None, None), slice(None, None, None)))))), (slice(0, 1000, None), slice(0, 3, None), slice(0, 250, None))) kwargs: {} Exception: OSError(-101, 'NetCDF: HDF error') distributed.worker - WARNING - Compute Failed Function: getter args: (ImplicitToExplicitIndexingAdapter(array=CopyOnWriteArray(array=LazilyOuterIndexedArray(array=, key=BasicIndexer((slice(None, None, None), slice(None, None, None), slice(None, None, None)))))), (slice(0, 1000, None), slice(0, 3, None), slice(0, 250, None))) kwargs: {} Exception: OSError(-101, 'NetCDF: HDF error') distributed.worker - WARNING - Compute Failed Function: getter args: (ImplicitToExplicitIndexingAdapter(array=CopyOnWriteArray(array=LazilyOuterIndexedArray(array=, key=BasicIndexer((slice(None, None, None), slice(None, None, None), slice(None, None, None)))))), (slice(0, 1000, None), slice(0, 3, None), slice(0, 250, None))) kwargs: {} Exception: OSError(-101, 'NetCDF: HDF error') distributed.worker - WARNING - Compute Failed Function: getter args: (ImplicitToExplicitIndexingAdapter(array=CopyOnWriteArray(array=LazilyOuterIndexedArray(array=, key=BasicIndexer((slice(None, None, None), slice(None, None, None), slice(None, None, None)))))), (slice(0, 1000, None), slice(0, 3, None), slice(0, 250, None))) kwargs: {} Exception: OSError(-101, 'NetCDF: HDF error') distributed.worker - WARNING - Compute Failed Function: getter args: (ImplicitToExplicitIndexingAdapter(array=CopyOnWriteArray(array=LazilyOuterIndexedArray(array=, key=BasicIndexer((slice(None, None, None), slice(None, None, None), slice(None, None, None)))))), (slice(0, 1000, None), slice(0, 3, None), slice(0, 250, None))) kwargs: {} Exception: OSError(-101, 'NetCDF: HDF error') distributed.worker - WARNING - Compute Failed Function: getter args: (ImplicitToExplicitIndexingAdapter(array=CopyOnWriteArray(array=LazilyOuterIndexedArray(array=, key=BasicIndexer((slice(None, None, None), slice(None, None, None), slice(None, None, None)))))), (slice(0, 1000, None), slice(0, 3, None), slice(0, 250, None))) kwargs: {} Exception: OSError(-101, 'NetCDF: HDF error') distributed.worker - WARNING - Compute Failed Function: getter args: (ImplicitToExplicitIndexingAdapter(array=CopyOnWriteArray(array=LazilyOuterIndexedArray(array=, key=BasicIndexer((slice(None, None, None), slice(None, None, None), slice(None, None, None)))))), (slice(0, 1000, None), slice(0, 3, None), slice(0, 250, None))) kwargs: {} Exception: OSError(-101, 'NetCDF: HDF error') distributed.worker - WARNING - Compute Failed Function: getter args: (ImplicitToExplicitIndexingAdapter(array=CopyOnWriteArray(array=LazilyOuterIndexedArray(array=, key=BasicIndexer((slice(None, None, None), slice(None, None, None), slice(None, None, None)))))), (slice(0, 1000, None), slice(0, 3, None), slice(0, 250, None))) kwargs: {} Exception: OSError(-101, 'NetCDF: HDF error') distributed.worker - WARNING - Compute Failed Function: getter args: (ImplicitToExplicitIndexingAdapter(array=CopyOnWriteArray(array=LazilyOuterIndexedArray(array=, key=BasicIndexer((slice(None, None, None), slice(None, None, None), slice(None, None, None)))))), (slice(0, 1000, None), slice(0, 3, None), slice(0, 250, None))) kwargs: {} Exception: OSError(-101, 'NetCDF: HDF error') distributed.worker - WARNING - Compute Failed Function: getter args: (ImplicitToExplicitIndexingAdapter(array=CopyOnWriteArray(array=LazilyOuterIndexedArray(array=, key=BasicIndexer((slice(None, None, None), slice(None, None, None), slice(None, None, None)))))), (slice(0, 1000, None), slice(0, 3, None), slice(0, 250, None))) kwargs: {} Exception: OSError(-101, 'NetCDF: HDF error') distributed.worker - WARNING - Compute Failed Function: getter args: (ImplicitToExplicitIndexingAdapter(array=CopyOnWriteArray(array=LazilyOuterIndexedArray(array=, key=BasicIndexer((slice(None, None, None), slice(None, None, None), slice(None, None, None)))))), (slice(0, 1000, None), slice(0, 3, None), slice(0, 250, None))) kwargs: {} Exception: OSError(-101, 'NetCDF: HDF error') Traceback (most recent call last): File """", line 1, in distributed.worker - WARNING - Compute Failed Function: getter args: (ImplicitToExplicitIndexingAdapter(array=CopyOnWriteArray(array=LazilyOuterIndexedArray(array=, key=BasicIndexer((slice(None, None, None), slice(None, None, None), slice(None, None, None)))))), (slice(0, 1000, None), slice(0, 3, None), slice(0, 250, None))) kwargs: {} Exception: OSError(-101, 'NetCDF: HDF error') File ""C:\PydroXL_19\envs\dasktest\lib\site-packages\xarray\core\dataarray.py"", line 837, in compute return new.load(**kwargs) File ""C:\PydroXL_19\envs\dasktest\lib\site-packages\xarray\core\dataarray.py"", line 811, in load ds = self._to_temp_dataset().load(**kwargs) File ""C:\PydroXL_19\envs\dasktest\lib\site-packages\xarray\core\dataset.py"", line 649, in load evaluated_data = da.compute(*lazy_data.values(), **kwargs) File ""C:\PydroXL_19\envs\dasktest\lib\site-packages\dask\base.py"", line 436, in compute results = schedule(dsk, keys, **kwargs) File ""C:\PydroXL_19\envs\dasktest\lib\site-packages\distributed\client.py"", line 2545, in get results = self.gather(packed, asynchronous=asynchronous, direct=direct) File ""C:\PydroXL_19\envs\dasktest\lib\site-packages\distributed\client.py"", line 1845, in gather asynchronous=asynchronous, File ""C:\PydroXL_19\envs\dasktest\lib\site-packages\distributed\client.py"", line 762, in sync self.loop, func, *args, callback_timeout=callback_timeout, **kwargs File ""C:\PydroXL_19\envs\dasktest\lib\site-packages\distributed\utils.py"", line 333, in sync raise exc.with_traceback(tb) File ""C:\PydroXL_19\envs\dasktest\lib\site-packages\distributed\utils.py"", line 317, in f result[0] = yield future File ""C:\PydroXL_19\envs\dasktest\lib\site-packages\tornado\gen.py"", line 735, in run value = future.result() File ""C:\PydroXL_19\envs\dasktest\lib\site-packages\distributed\client.py"", line 1701, in _gather raise exception.with_traceback(traceback) File ""C:\PydroXL_19\envs\dasktest\lib\site-packages\dask\array\core.py"", line 106, in getter c = np.asarray(c) File ""C:\PydroXL_19\envs\dasktest\lib\site-packages\numpy\core\_asarray.py"", line 85, in asarray return array(a, dtype, copy=False, order=order) File ""C:\PydroXL_19\envs\dasktest\lib\site-packages\xarray\core\indexing.py"", line 481, in __array__ return np.asarray(self.array, dtype=dtype) File ""C:\PydroXL_19\envs\dasktest\lib\site-packages\numpy\core\_asarray.py"", line 85, in asarray return array(a, dtype, copy=False, order=order) File ""C:\PydroXL_19\envs\dasktest\lib\site-packages\xarray\core\indexing.py"", line 643, in __array__ return np.asarray(self.array, dtype=dtype) File ""C:\PydroXL_19\envs\dasktest\lib\site-packages\numpy\core\_asarray.py"", line 85, in asarray return array(a, dtype, copy=False, order=order) File ""C:\PydroXL_19\envs\dasktest\lib\site-packages\xarray\core\indexing.py"", line 547, in __array__ return np.asarray(array[self.key], dtype=None) File ""C:\PydroXL_19\envs\dasktest\lib\site-packages\xarray\backends\netCDF4_.py"", line 72, in __getitem__ key, self.shape, indexing.IndexingSupport.OUTER, self._getitem File ""C:\PydroXL_19\envs\dasktest\lib\site-packages\xarray\core\indexing.py"", line 827, in explicit_indexing_adapter result = raw_indexing_method(raw_key.tuple) File ""C:\PydroXL_19\envs\dasktest\lib\site-packages\xarray\backends\netCDF4_.py"", line 83, in _getitem original_array = self.get_array(needs_lock=False) File ""C:\PydroXL_19\envs\dasktest\lib\site-packages\xarray\backends\netCDF4_.py"", line 62, in get_array ds = self.datastore._acquire(needs_lock) File ""C:\PydroXL_19\envs\dasktest\lib\site-packages\xarray\backends\netCDF4_.py"", line 360, in _acquire with self._manager.acquire_context(needs_lock) as root: File ""C:\PydroXL_19\envs\dasktest\lib\contextlib.py"", line 81, in __enter__ return next(self.gen) File ""C:\PydroXL_19\envs\dasktest\lib\site-packages\xarray\backends\file_manager.py"", line 186, in acquire_context file, cached = self._acquire_with_cache_info(needs_lock) File ""C:\PydroXL_19\envs\dasktest\lib\site-packages\xarray\backends\file_manager.py"", line 204, in _acquire_with_cache_info file = self._opener(*self._args, **kwargs) File ""netCDF4\_netCDF4.pyx"", line 2321, in netCDF4._netCDF4.Dataset.__init__ File ""netCDF4\_netCDF4.pyx"", line 1885, in netCDF4._netCDF4._ensure_nc_success OSError: [Errno -101] NetCDF: HDF error: b'D:\\dasktest\\data_dir\\EM2040\\converted\\test\\test4.nc' ```","{""url"": ""https://api.github.com/repos/pydata/xarray/issues/3684/reactions"", ""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,,13221727,issue 1244030662,I_kwDOAMm_X85KJmbG,6625,"Why am I getting 'Passing method to Float64Index.get_loc is deprecated' error when using the .sel method to extract some data, and how do I solve it?",10137,closed,0,,,5,2022-05-21T16:40:53Z,2022-09-26T08:47:03Z,2022-07-09T00:41:53Z,NONE,,,,"### What is your issue? `climateModels['CSIRO-QCCCE-CSIRO-Mk3-6']['RCP 45'][2]['tasmax'].sel(lon = 74, lat= 31, time = '2041-06-16', method='nearest').data[0]` `\anaconda3\lib\site-packages\xarray\core\indexes.py:234: FutureWarning: Passing method to Float64Index.get_loc is deprecated and will raise in a future version. Use index.get_indexer([item], method=...) instead. I don't know much about how to solve this issue, can anyone help me out please?","{""url"": ""https://api.github.com/repos/pydata/xarray/issues/6625/reactions"", ""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,completed,13221727,issue 1299316581,I_kwDOAMm_X85Ncf9l,6766,"xr.open_dataset(url) gives NetCDF4 (lru_cache.py) error ""oc_open: Could not read url""",10137,closed,0,,,8,2022-07-08T18:15:18Z,2022-07-11T14:49:10Z,2022-07-11T14:49:09Z,NONE,,,,"### What is your issue? This code I use was working about a year ago but today gives me an error: ``` import xarray as xr url = 'http://psl.noaa.gov/thredds/dodsC/Datasets/NARR/monolevel/uwnd.10m.2000.nc' ds = xr. open_dataset(url) ``` The Traceback includes the following: ``` File ""C:\Users\Codiga_D\AppData\Local\Continuum\miniconda3\envs\EQ\lib\site-packages\xarray\backends\lru_cache.py"", line 53, in __getitem__ value = self._cache[key] ``` and ``` OSError: [Errno -68] NetCDF: I/O failure: b'http://psl.noaa.gov/thredds/dodsC/Datasets/NARR/monolevel/uwnd.10m.2000.nc' Note:Caching=1 Error:curl error: SSL connect error curl error details: Warning:oc_open: Could not read url ``` I have confirmed that the file I am trying to read is on the server and the server is not requiring a password (nothing I am aware of, about the server, has changed since my code used to work successfully). I am on Windows using a conda virtual env (no pip). My xarray is 0.20.2 and my netCDF4 is 1.6.0-- these are almost certainly more recent than the ones I was using when my code used to succeed, but I didn't record which version(s) used to work. It was suggested that I pin netcdf4 to 1.5.8, so I tried this but got the same error. Recently I had to update security certificates locally here, and this could be related, but I'm not sure. Any suggestions for how I should troubleshoot this? Also, should I post an issue at https://github.com/Unidata/netcdf4-python instead of, and/or in addition to, this one? I found these issues, which seem possibly related, but don't seem to be resolved well yet: https://github.com/Unidata/netcdf4-python/issues/755 https://github.com/pydata/xarray/issues/4925 (I also opened 'discussion' #6742 but so far there has been little response there.)","{""url"": ""https://api.github.com/repos/pydata/xarray/issues/6766/reactions"", ""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,completed,13221727,issue 270677100,MDExOlB1bGxSZXF1ZXN0MTUwMzA4NTg0,1682,Add option “engine”,10137,closed,0,,,11,2017-11-02T14:38:07Z,2022-04-15T02:01:28Z,2022-04-15T02:01:28Z,NONE,,0,pydata/xarray/pulls/1682,"Implements a new xarray option `engine` for setting the default backend data read/write engine. Inspired by [this](https://stackoverflow.com/a/47041952) Stack Overflow answer. This PR is not ready for merge yet but I wanted to verify if the code changes are on the right track. The default `engine` option value is `None`. If this option is set the `_get_default_engine()` function will return its value without going through the import statements chain. - [ ] Closes #xxxx - [ ] Tests added / passed - [ ] Passes ``git diff upstream/master **/*py | flake8 --diff`` - [ ] Fully documented, including `whats-new.rst` for all changes and `api.rst` for new API","{""url"": ""https://api.github.com/repos/pydata/xarray/issues/1682/reactions"", ""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,,13221727,pull 910844095,MDU6SXNzdWU5MTA4NDQwOTU=,5434,xarray.open_rasterio,10137,closed,0,,,2,2021-06-03T20:51:38Z,2022-04-09T01:31:26Z,2022-04-09T01:31:26Z,NONE,,,,"Could you please change `xarray.open_rasterio` from `experimental` to `stable` with more faster capability of reading geotiff files (if possible)? For original array indexing capabilities, I would like to stick in `xarray` than `rioxarray`. With much respected. Thank you.","{""url"": ""https://api.github.com/repos/pydata/xarray/issues/5434/reactions"", ""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,completed,13221727,issue 853260893,MDExOlB1bGxSZXF1ZXN0NjExMzc3OTQ0,5131,Remove trailing space from DatasetGroupBy repr,10137,closed,0,,,1,2021-04-08T09:19:30Z,2021-04-08T14:49:15Z,2021-04-08T14:49:15Z,NONE,,0,pydata/xarray/pulls/5131,"Remove trailing whitespace from DatasetGroupBy representation because flake8 reports it as a violation when present in doctests. Fix #5130","{""url"": ""https://api.github.com/repos/pydata/xarray/issues/5131/reactions"", ""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,,13221727,pull 853168658,MDU6SXNzdWU4NTMxNjg2NTg=,5130,Trailing whitespace in DatasetGroupBy text representation,10137,closed,0,,,1,2021-04-08T07:39:08Z,2021-04-08T14:49:14Z,2021-04-08T14:49:14Z,NONE,,,,"When displaying a DatasetGroupBy in an interactive Python session, the first line of output contains a trailing whitespace. The first example in the documentation demonstrate this: ```pycon >>> import xarray as xr, numpy as np >>> ds = xr.Dataset( ... {""foo"": ((""x"", ""y""), np.random.rand(4, 3))}, ... coords={""x"": [10, 20, 30, 40], ""letters"": (""x"", list(""abba""))}, ... ) >>> ds.groupby(""letters"") DatasetGroupBy, grouped over 'letters' 2 groups with labels 'a', 'b'. ``` There is a trailing whitespace in the first line of output which is ""DatasetGroupBy, grouped over 'letters' "". This can be seen more clearly by converting the object to a string (note the whitespace before `\n`): ```pycon >>> str(ds.groupby(""letters"")) ""DatasetGroupBy, grouped over 'letters' \n2 groups with labels 'a', 'b'."" ``` While this isn't a problem in itself, it causes an issue for us because we use flake8 in continuous integration to verify that our code is correctly formatted and we also have doctests that rely on DatasetGroupBy textual representation. Flake8 reports a violation on the trailing whitespaces in our docstrings. If we remove the trailing whitespaces, our doctests fail because the expected output doesn't match the actual output. So we have conflicting constraints coming from our tools which both seem reasonable. Trailing whitespaces are forbidden by flake8 because, among other reasons, they lead to noisy git diffs. Doctest want the expected output to be exactly the same as the actual output and considers a trailing whitespace to be a significant difference. We could configure flake8 to ignore this particular violation for the files in which we have these doctests, but this may cause other trailing whitespaces to creep in our code, which we don't want. Unfortunately it's not possible to just add `# NoQA` comments to get flake8 to ignore the violation only for specific lines because that creates a difference between expected and actual output from doctest point of view. Flake8 doesn't allow to disable checks for blocks of code either. Is there a reason for having this trailing whitespace in DatasetGroupBy representation? Whould it be OK to remove it? If so please let me know and I can make a pull request.","{""url"": ""https://api.github.com/repos/pydata/xarray/issues/5130/reactions"", ""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,completed,13221727,issue 323703742,MDU6SXNzdWUzMjM3MDM3NDI=,2139,From pandas to xarray without blowing up memory,10137,closed,0,,,15,2018-05-16T16:51:09Z,2020-10-14T19:34:54Z,2019-08-27T08:54:26Z,NONE,,,,"I have a billion rows of data, but really it's just two categorical variables, time, lat, lon and some data variables. Thinking it would somehow help me get the data into xarray, I created a five level pandas MultiIndex array out of the data, but thus far this has not been successful. xarray tries to create a product and that's just not going to work.. Trying to write a NetCDF file has presented its own issues, and I'm left wondering if there isn't a much simpler way to go about this? ","{""url"": ""https://api.github.com/repos/pydata/xarray/issues/2139/reactions"", ""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,completed,13221727,issue 454073421,MDU6SXNzdWU0NTQwNzM0MjE=,3007,NaN values for variables when converting from a pandas dataframe to xarray.DataSet,10137,closed,0,,,5,2019-06-10T09:15:21Z,2020-03-23T13:15:16Z,2020-03-23T13:15:15Z,NONE,,,,"#### Code Sample, a copy-pastable example if possible ```python wind_surface hurs bui fwi lat lon time 34.511383 16.467664 1971-01-10 12:00:00 29.658546 70.481293 ... 8.134300 7.409146 34.515558 16.723973 1971-01-10 12:00:00 30.896049 71.356644 ... 8.874528 8.399877 34.517359 16.852138 1971-01-10 12:00:00 31.514799 71.708603 ... 8.789351 8.763743 34.518970 16.980310 1971-01-10 12:00:00 32.105423 72.023773 ... 8.962551 9.125644 34.520391 17.108487 1971-01-10 12:00:00 32.724174 72.106110 ... 8.725038 9.249104 [5 rows x 10 columns] In [81]: df.to_xarray() Out[81]: Dimensions: (lat: 5, lon: 5, time: 1) Coordinates: * lat (lat) float64 34.51 34.52 34.52 34.52 34.52 * lon (lon) float64 16.47 16.72 16.85 16.98 17.11 * time (time) object '1971-01-10 12:00:00' Data variables: wind_surface (lat, lon, time) float64 29.658546 nan nan ... nan 32.724174 hurs (lat, lon, time) float64 70.48129 nan nan ... nan nan 72.10611 precip (lat, lon, time) float64 0.0 nan nan nan ... nan nan nan 0.0 tmax (lat, lon, time) float64 16.060822 nan nan ... nan 16.185822 ffmc (lat, lon, time) float64 83.58528 nan nan ... nan nan 84.05673 isi (lat, lon, time) float64 7.7641253 nan nan ... nan nan 9.64494 dmc (lat, lon, time) float64 6.797345 nan nan ... nan nan 7.90833 dc (lat, lon, time) float64 25.314878 nan nan ... nan 24.324644 bui (lat, lon, time) float64 8.1343 nan nan ... nan nan 8.725038 fwi (lat, lon, time) float64 7.409146 nan nan ... nan 9.2491045 ``` #### Problem description Hi, I get those nan values for variables when I try to convert from a pandas.DataFrame with MultiIndex to a xarray.DataArray. The same happend if I try to build a xarray.Dataset and then unstack the multiindex as shown below: ```python ds = xr.Dataset(df) ds.unstack('dim_0') Dimensions: (lat: 5, lon: 5, time: 1) Coordinates: * lat (lat) float64 34.51 34.52 34.52 34.52 34.52 * lon (lon) float64 16.47 16.72 16.85 16.98 17.11 * time (time) object '1971-01-10 12:00:00' Data variables: wind_surface (lat, lon, time) float32 29.658546 nan nan ... nan 32.724174 hurs (lat, lon, time) float32 70.48129 nan nan ... nan nan 72.10611 precip (lat, lon, time) float32 0.0 nan nan nan ... nan nan nan 0.0 ``` Maybe it's not an issue. I don't know. I'm lost. Any help is welcome. Regards #### Output of ``xr.show_versions()``
# Paste the output here xr.show_versions() here INSTALLED VERSIONS ------------------ commit: None python: 3.7.3 (default, May 9 2019, 11:55:04) [GCC 8.3.0] python-bits: 64 OS: Linux OS-release: 5.0.0-16-generic machine: x86_64 processor: x86_64 byteorder: little LC_ALL: None LANG: en_US.UTF-8 LOCALE: en_US.UTF-8 libhdf5: 1.10.4 libnetcdf: 4.6.2 xarray: 0.12.1 pandas: 0.24.2 numpy: 1.16.3 scipy: 1.3.0 netCDF4: 1.5.2 pydap: installed h5netcdf: 0.7.3 h5py: 2.9.0 Nio: None zarr: 2.3.1 cftime: 1.0.1 nc_time_axis: 1.1.0 PseudonetCDF: None rasterio: 1.0.23 cfgrib: None iris: 2.3.0dev0 bottleneck: 1.2.1 dask: 1.2.2 distributed: None matplotlib: 3.1.0 cartopy: 0.17.1.dev168+ seaborn: 0.9.0 setuptools: 40.8.0 pip: 19.1.1 conda: None pytest: None IPython: 7.5.0 sphinx: 2.0.1
","{""url"": ""https://api.github.com/repos/pydata/xarray/issues/3007/reactions"", ""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,completed,13221727,issue 270701183,MDExOlB1bGxSZXF1ZXN0MTUwMzI2NzMw,1683,Add h5netcdf to the engine import hierarchy,10137,closed,0,,,2,2017-11-02T15:39:35Z,2018-06-05T05:16:40Z,2018-02-12T16:06:44Z,NONE,,0,pydata/xarray/pulls/1683,"h5netcdf is now part of the import statements in the `_get_default_engine()` function. The order is: netcdf4, scipy.io.netcdf, h5netcdf. - [ ] Closes #xxxx - [ ] Tests added / passed - [ ] Passes ``git diff upstream/master **/*py | flake8 --diff`` - [ ] Fully documented, including `whats-new.rst` for all changes and `api.rst` for new API","{""url"": ""https://api.github.com/repos/pydata/xarray/issues/1683/reactions"", ""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,,13221727,pull 291926319,MDU6SXNzdWUyOTE5MjYzMTk=,1860,IndexError when accesing a data variable through a PydapDataStore,10137,closed,0,,,4,2018-01-26T14:58:14Z,2018-01-27T08:41:21Z,2018-01-27T08:41:21Z,NONE,,,,"#### Code Sample, a copy-pastable example if possible ```python import xarray as xa from pydap.cas.urs import setup_session url = 'https://goldsmr4.gesdisc.eosdis.nasa.gov/dods/M2T1NXFLX' session = setup_session(username='****', password='****', check_url=url) store = xa.backends.PydapDataStore.open(url, session=session) ds = xa.open_dataset(store) tlml = ds['tlml'] print(tlml[0,0,0]) ``` #### Problem description I was trying to connect to NASA MERRA-2 data through OPeNDAP, following the documentation here: http://xarray.pydata.org/en/stable/io.html#OPeNDAP. Opening the dataset works fine, but trying to access a data variable throws a strange `IndexError`. Traceback below: ``` Traceback (most recent call last): File ""C:\Anaconda3\envs\jaws\lib\site-packages\IPython\core\formatters.py"", line 702, in __call__ printer.pretty(obj) File ""C:\Anaconda3\envs\jaws\lib\site-packages\IPython\lib\pretty.py"", line 395, in pretty return _default_pprint(obj, self, cycle) File ""C:\Anaconda3\envs\jaws\lib\site-packages\IPython\lib\pretty.py"", line 510, in _default_pprint _repr_pprint(obj, p, cycle) File ""C:\Anaconda3\envs\jaws\lib\site-packages\IPython\lib\pretty.py"", line 701, in _repr_pprint output = repr(obj) File ""c:\src\xarray\xarray\core\common.py"", line 100, in __repr__ return formatting.array_repr(self) File ""c:\src\xarray\xarray\core\formatting.py"", line 393, in array_repr summary.append(short_array_repr(arr.values)) File ""c:\src\xarray\xarray\core\dataarray.py"", line 411, in values return self.variable.values File ""c:\src\xarray\xarray\core\variable.py"", line 392, in values return _as_array_or_item(self._data) File ""c:\src\xarray\xarray\core\variable.py"", line 216, in _as_array_or_item data = np.asarray(data) File ""C:\Anaconda3\envs\jaws\lib\site-packages\numpy\core\numeric.py"", line 492, in asarray return array(a, dtype, copy=False, order=order) File ""c:\src\xarray\xarray\core\indexing.py"", line 572, in __array__ self._ensure_cached() File ""c:\src\xarray\xarray\core\indexing.py"", line 569, in _ensure_cached self.array = NumpyIndexingAdapter(np.asarray(self.array)) File ""C:\Anaconda3\envs\jaws\lib\site-packages\numpy\core\numeric.py"", line 492, in asarray return array(a, dtype, copy=False, order=order) File ""c:\src\xarray\xarray\core\indexing.py"", line 553, in __array__ return np.asarray(self.array, dtype=dtype) File ""C:\Anaconda3\envs\jaws\lib\site-packages\numpy\core\numeric.py"", line 492, in asarray return array(a, dtype, copy=False, order=order) File ""c:\src\xarray\xarray\core\indexing.py"", line 520, in __array__ return np.asarray(array[self.key], dtype=None) File ""c:\src\xarray\xarray\conventions.py"", line 134, in __getitem__ return np.asarray(self.array[key], dtype=self.dtype) File ""c:\src\xarray\xarray\coding\variables.py"", line 71, in __getitem__ return self.func(self.array[key]) File ""c:\src\xarray\xarray\coding\variables.py"", line 140, in _apply_mask data = np.asarray(data, dtype=dtype) File ""C:\Anaconda3\envs\jaws\lib\site-packages\numpy\core\numeric.py"", line 492, in asarray return array(a, dtype, copy=False, order=order) File ""c:\src\xarray\xarray\core\indexing.py"", line 520, in __array__ return np.asarray(array[self.key], dtype=None) File ""c:\src\xarray\xarray\backends\pydap_.py"", line 33, in __getitem__ result = robust_getitem(array, key, catch=ValueError) File ""c:\src\xarray\xarray\backends\common.py"", line 67, in robust_getitem return array[key] File ""C:\src\pydap\src\pydap\model.py"", line 320, in __getitem__ out.data = self._get_data_index(index) File ""C:\src\pydap\src\pydap\model.py"", line 350, in _get_data_index return self._data[index] File ""C:\src\pydap\src\pydap\handlers\dap.py"", line 149, in __getitem__ return dataset[self.id].data File ""C:\src\pydap\src\pydap\model.py"", line 426, in __getitem__ return self._getitem_string(key) File ""C:\src\pydap\src\pydap\model.py"", line 410, in _getitem_string return self[splitted[0]]['.'.join(splitted[1:])] File ""C:\src\pydap\src\pydap\model.py"", line 320, in __getitem__ out.data = self._get_data_index(index) File ""C:\src\pydap\src\pydap\model.py"", line 350, in _get_data_index return self._data[index] IndexError: only integers, slices (`:`), ellipsis (`...`), numpy.newaxis (`None`) and integer or boolean arrays are valid indices ``` #### Expected Output Expecting to see the value of variable 'tlml' at these coordinates. #### Output of ``xr.show_versions()``
INSTALLED VERSIONS
------------------
commit: None
python: 3.6.4.final.0
python-bits: 64
OS: Windows
OS-release: 7
machine: AMD64
processor: Intel64 Family 6 Model 61 Stepping 4, GenuineIntel
byteorder: little
LC_ALL: None
LANG: en
LOCALE: None.None

xarray: 0.10.0+dev44.g0a0593d
pandas: 0.22.0
numpy: 1.14.0
scipy: 1.0.0
netCDF4: 1.3.1
h5netcdf: 0.5.0
Nio: None
zarr: None
bottleneck: 1.2.1
cyordereddict: None
dask: 0.16.1
distributed: 1.20.2
matplotlib: None
cartopy: None
seaborn: None
setuptools: 38.4.0
pip: 9.0.1
conda: None
pytest: None
IPython: 6.2.1
sphinx: 1.6.6

","{""url"": ""https://api.github.com/repos/pydata/xarray/issues/1860/reactions"", ""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,completed,13221727,issue 291524555,MDU6SXNzdWUyOTE1MjQ1NTU=,1857,AttributeError: '' object has no attribute 'shape',10137,closed,0,,,6,2018-01-25T10:42:20Z,2018-01-26T13:25:07Z,2018-01-26T13:25:07Z,NONE,,,,"#### Code Sample, a copy-pastable example if possible ```python import xarray as xa from pydap.cas.urs import setup_session url = 'https://goldsmr4.gesdisc.eosdis.nasa.gov/dods/M2T1NXFLX' session = setup_session(username='****', password='****', check_url=url) store = xa.backends.PydapDataStore.open(url, session=session) ds = xa.open_dataset(store) ``` #### Problem description I was trying to connect to NASA MERRA-2 data through OPeNDAP, following the documentation here: http://xarray.pydata.org/en/stable/io.html#OPeNDAP. I was able to get through a previous bug (#1775) by installing the latest master version of xarray. #### Expected Output Expecting the collection (M2T1NXFLX) content to show up as an xarray dataset. #### Output of ``xr.show_versions()``
INSTALLED VERSIONS
------------------
commit: None
python: 3.6.4.final.0
python-bits: 64
OS: Windows
OS-release: 7
machine: AMD64
processor: Intel64 Family 6 Model 61 Stepping 4, GenuineIntel
byteorder: little
LC_ALL: None
LANG: None
LOCALE: None.None

xarray: 0.10.0+dev44.g0a0593d
pandas: 0.22.0
numpy: 1.14.0
scipy: 1.0.0
netCDF4: None
h5netcdf: None
Nio: None
zarr: None
bottleneck: None
cyordereddict: None
dask: None
distributed: None
matplotlib: None
cartopy: None
seaborn: None
setuptools: 38.4.0
pip: 9.0.1
conda: None
pytest: None
IPython: None
sphinx: None
","{""url"": ""https://api.github.com/repos/pydata/xarray/issues/1857/reactions"", ""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,completed,13221727,issue 245186903,MDU6SXNzdWUyNDUxODY5MDM=,1486,boolean indexing,10137,closed,0,,,2,2017-07-24T19:39:42Z,2017-09-07T08:06:49Z,2017-09-07T08:06:48Z,NONE,,,,"I am trying to figure out how boolean indexing works in xarray. I have a couple of data arrays below: ``` X_day[latitude,longitude,time] X_night[latitude,longitude,time] Rule[latitude,longitude] ``` I want to merge X_day and X_night as a new X based on Rule. First I make a copy of X_day to be X: `X = X_day ` Then I tried: `X[Rule==True, :] = X_night ` and this: `X.values[Rule==True, :] = X_night.values ` and also this: `X.where(Rule==True) = X_night ` None of the above assignment worked. Please help.","{""url"": ""https://api.github.com/repos/pydata/xarray/issues/1486/reactions"", ""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,completed,13221727,issue 244702576,MDU6SXNzdWUyNDQ3MDI1NzY=,1484,Matrix cross product in xarray,10137,closed,0,,,4,2017-07-21T15:21:37Z,2017-07-24T16:36:22Z,2017-07-24T16:36:22Z,NONE,,,,"Hi I am new to xarray and need some advice on one task. I need to do a cross product calculation using variables from a netcdf file and write the output to a new netcdf file. I feel this could be done using netcdf-python and pandas, but I hope to use xarray to simplify the task. My code will be something like this: ds = xr.open_dataset(NC_FILE) var1 = ds['VAR1'] var2 = ds['VAR2'] var3 = ds['VAR3'] var4 = ds['VAR4'] var1-4 above will have dimensions [latitude, longitude]. I will use var1-4 to generate a matrix of dimensions [Nlat, Nlon, M], something like: [var1, var1-var2, var1-var3, (var1-var2)*np.cos(var4)] (here, M=4). My question here is, how do I build this matrix in xarray Dataset? Since this matrix will be eventually used to cross product with another matrix (pd.DataFrame) of dimensions [M, K], is it better to convert var1-4 to pd.DataFrame first? Following code will be like this: matrix = [var1, var1-var2, var1-var3, (var1-var2)*np.cos(var4)] # Nlat x Nlon x M factor = pd.read_csv(somefile) # M x K result = pd.DataFrame.dot(matrix,factor) # Nlat x Nlon x K result2 = xr.Dataset(result) result2.to_netcdf(outfile) Can someone show me the correct code to build the Nlat x Nlon x M matrix? Can the cross product be done in xr.Dataset to avoid conversion to and from pd.DataFrame? Thank you, Xin ","{""url"": ""https://api.github.com/repos/pydata/xarray/issues/1484/reactions"", ""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,completed,13221727,issue 98442885,MDU6SXNzdWU5ODQ0Mjg4NQ==,505,Resampling drops datavars with unsigned integer datatypes,10137,closed,0,,,1,2015-07-31T18:04:51Z,2015-07-31T19:44:32Z,2015-07-31T19:44:32Z,NONE,,,,"If a variable has an unsigned integer type (uint16, uint32, etc.), resampling time will drop that variable. Does not occur with signed integer types (int16, etc.). ``` import numpy as np import pandas as pd import xray numbers = np.arange(1, 6).astype('uint32') ds = xray.Dataset( {'numbers': ('time', numbers)}, coords = {'time': pd.date_range('2000-01-01', periods=5)}) resampled = ds.resample('24H', dim='time') assert 'numbers' not in resampled ``` ","{""url"": ""https://api.github.com/repos/pydata/xarray/issues/505/reactions"", ""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,completed,13221727,issue 91676831,MDU6SXNzdWU5MTY3NjgzMQ==,448,asarray Compatibility,10137,closed,0,,,3,2015-06-29T02:45:25Z,2015-06-30T23:02:57Z,2015-06-30T23:02:57Z,NONE,,,,"To ""numpify"" a function, usually asarray is used: def db2w(arr): return 10 *\* (np.asarray(arr) / 20.0) Now you could replace the divide with np.divide, but it seems much simpler to use np.asarray. Unfortunately, if you use any function that has been ""vectorized,"" it will only return the values of the DataArray as an ndarray. This strips the object of any ""xray"" meta-data and severely limits the use of this class. It requires that any function that wants to work seamlessly with a DataArray explicitly check that it's an instance of DataArray This seems counter-intuitive to the numpy framework where any function, once properly vectorized, can work with python scalars (int, float) or list types (tuple, list) as well as the actual ndarray class. It would be awesome if code that worked for these cases could just work for DataArrays. ","{""url"": ""https://api.github.com/repos/pydata/xarray/issues/448/reactions"", ""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,completed,13221727,issue