id,node_id,number,title,user,state,locked,assignee,milestone,comments,created_at,updated_at,closed_at,author_association,active_lock_reason,draft,pull_request,body,reactions,performed_via_github_app,state_reason,repo,type 1589771368,I_kwDOAMm_X85ewfxo,7541,standard deviation over one dimension of a chunked DataArray leads to NaN,11750960,closed,0,,,2,2023-02-17T18:07:06Z,2023-02-17T18:12:10Z,2023-02-17T18:12:10Z,CONTRIBUTOR,,,,"### What happened? When computing the standard deviation over one dimension of a chunked DataArray, one may get NaNs. ### What did you expect to happen? We should not have any NaNs ### Minimal Complete Verifiable Example ```Python x = (np.random.randn(10,10) + 1j*np.random.randn(10,10)) da = xr.DataArray(x).chunk(dict(dim_0=3)) da.std(""dim_0"").compute() # NaN da.compute().std(""dim_0"") # no NaNs ``` ``` ### MVCE confirmation - [X] Minimal example — the example is as focused as reasonably possible to demonstrate the underlying issue in xarray. - [X] Complete example — the example is self-contained, including all data and the text of any traceback. - [X] Verifiable example — the example copy & pastes into an IPython prompt or [Binder notebook](https://mybinder.org/v2/gh/pydata/xarray/main?urlpath=lab/tree/doc/examples/blank_template.ipynb), returning the result. - [X] New issue — a search of GitHub Issues suggests this is not a duplicate. ### Relevant log output _No response_ ### Anything else we need to know? _No response_ ### Environment
INSTALLED VERSIONS ------------------ commit: None python: 3.10.8 (main, Nov 24 2022, 08:09:04) [Clang 14.0.6 ] python-bits: 64 OS: Darwin OS-release: 22.3.0 machine: x86_64 processor: i386 byteorder: little LC_ALL: None LANG: fr_FR.UTF-8 LOCALE: ('fr_FR', 'UTF-8') libhdf5: 1.12.2 libnetcdf: 4.8.1 xarray: 2022.12.0 pandas: 1.5.2 numpy: 1.23.5 scipy: 1.9.3 netCDF4: 1.6.2 pydap: None h5netcdf: None h5py: 3.7.0 Nio: None zarr: 2.13.3 cftime: 1.5.1.1 nc_time_axis: None PseudoNetCDF: None rasterio: 1.3.4 cfgrib: None iris: None bottleneck: 1.3.5 dask: 2022.02.1 distributed: 2022.2.1 matplotlib: 3.6.2 cartopy: 0.21.1 seaborn: 0.12.1 numbagg: None fsspec: 2022.11.0 cupy: None pint: None sparse: None flox: None numpy_groupies: None setuptools: 65.5.0 pip: 22.3.1 conda: None pytest: None mypy: None IPython: 8.7.0 sphinx: None
","{""url"": ""https://api.github.com/repos/pydata/xarray/issues/7541/reactions"", ""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,completed,13221727,issue 614785886,MDU6SXNzdWU2MTQ3ODU4ODY=,4046,automatic chunking of zarr archive,11750960,open,0,,,3,2020-05-08T14:42:00Z,2023-01-18T21:54:42Z,,CONTRIBUTOR,,,,"I store data in a zarr archive that is not chunked and the resulting zarr archive is chunked. This may be as simple usage question. I don't know how to turn this behavior off. #### Code sample Here is minimal example that reproduces the issue: ```python ds = xr.DataArray(np.ones((200,800))).rename('foo').to_dataset() print('Initial chunks = {}'.format(ds.foo.chunks)) ds.to_zarr('test.zarr', mode='w') print('zarr archives contains: {}'.format(os.listdir('test.zarr/foo'))) ds = xr.open_zarr('test.zarr') print('Final chunks = {}'.format(ds.foo.chunks)) ``` returns: ``` Initial chunks = None zarr archives contains: ['.zarray', '.zattrs', '0.0', '0.1', '1.0', '1.1'] Final chunks = ((100, 100), (400, 400)) ``` #### Expected Output I would expect the archive to not to be chunked. #### Versions
Output of xr.show_versions() INSTALLED VERSIONS ------------------ commit: None python: 3.7.6 | packaged by conda-forge | (default, Mar 23 2020, 23:03:20) [GCC 7.3.0] python-bits: 64 OS: Linux OS-release: 3.12.53-60.30-default machine: x86_64 processor: x86_64 byteorder: little LC_ALL: None LANG: en_US.UTF-8 LOCALE: en_US.UTF-8 libhdf5: 1.10.5 libnetcdf: 4.7.4 xarray: 0.15.2.dev29+g6048356 pandas: 1.0.3 numpy: 1.18.1 scipy: 1.4.1 netCDF4: 1.5.3 pydap: None h5netcdf: None h5py: None Nio: None zarr: 2.4.0 cftime: 1.1.1.2 nc_time_axis: None PseudoNetCDF: None rasterio: None cfgrib: None iris: None bottleneck: None dask: 2.13.0 distributed: 2.13.0 matplotlib: 3.2.1 cartopy: 0.17.0 seaborn: 0.10.0 numbagg: None pint: None setuptools: 46.1.3.post20200325 pip: 20.0.2 conda: None pytest: None IPython: 7.13.0 sphinx: None
","{""url"": ""https://api.github.com/repos/pydata/xarray/issues/4046/reactions"", ""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,,13221727,issue 294735496,MDU6SXNzdWUyOTQ3MzU0OTY=,1889,call to colorbar not thread safe,11750960,closed,0,,,12,2018-02-06T12:05:44Z,2022-04-27T23:47:56Z,2022-04-27T23:47:56Z,CONTRIBUTOR,,,,"The following call in `xarray/xarray/plot/plot.py` does not seem to be thread safe: ``` cbar = plt.colorbar(primitive, **cbar_kwargs) ``` It leads to systematic crashes when distributed, with a cryptic error message (`ValueError: Unknown element o`). I have to call colorbars outside the xarray plot call to prevent crashes. A call of the following type may fix the problem: ``` cbar = fig.colorbar(primitive, **cbar_kwargs) ``` But `fig` does not seem to be available directly in plot.py. Maybe: ``` cbar = ax.get_figure().colorbar(primitive, **cbar_kwargs) ``` cheers","{""url"": ""https://api.github.com/repos/pydata/xarray/issues/1889/reactions"", ""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,completed,13221727,issue 667763555,MDU6SXNzdWU2Njc3NjM1NTU=,4284,overwriting netcdf file fails at read time,11750960,closed,0,,,1,2020-07-29T11:17:10Z,2020-08-01T20:54:16Z,2020-08-01T20:54:16Z,CONTRIBUTOR,,,,"I generate a dataset once: ``` ds = xr.DataArray(np.arange(10), name='x').to_dataset() ds.to_netcdf('test.nc', mode='w') ``` Now I overwrite with a new netcdf file and load: ``` ds = xr.DataArray(np.arange(20), name='x').to_dataset() ds.to_netcdf('test.nc', mode='w') ds_out = xr.open_dataset('test.nc') print(ds_out) ``` outputs: ``` Dimensions: (dim_0: 10) Dimensions without coordinates: dim_0 Data variables: x (dim_0) int64 ... ``` I would have expected to get the new dataset. If I use netcdf4, the file seems to have been properly overwritten: ``` import netCDF4 as nc d = nc.Dataset('test.nc') d ``` outputs: ``` root group (NETCDF4 data model, file format HDF5): dimensions(sizes): dim_0(20) variables(dimensions): int64 x(dim_0) groups: ``` **Environment**:
Output of xr.show_versions() INSTALLED VERSIONS ------------------ commit: None python: 3.7.6 | packaged by conda-forge | (default, Mar 23 2020, 23:03:20) [GCC 7.3.0] python-bits: 64 OS: Linux OS-release: 3.12.53-60.30-default machine: x86_64 processor: x86_64 byteorder: little LC_ALL: None LANG: en_US.UTF-8 LOCALE: en_US.UTF-8 libhdf5: 1.10.5 libnetcdf: 4.7.4 xarray: 0.15.1 pandas: 1.0.3 numpy: 1.18.1 scipy: 1.4.1 netCDF4: 1.5.3 pydap: None h5netcdf: None h5py: None Nio: None zarr: 2.4.0 cftime: 1.1.1.2 nc_time_axis: None PseudoNetCDF: None rasterio: None cfgrib: None iris: None bottleneck: None dask: 2.13.0 distributed: 2.13.0 matplotlib: 3.3.0 cartopy: 0.17.0 seaborn: 0.10.0 numbagg: None setuptools: 46.1.3.post20200325 pip: 20.0.2 conda: None pytest: None IPython: 7.13.0 sphinx: None
","{""url"": ""https://api.github.com/repos/pydata/xarray/issues/4284/reactions"", ""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,completed,13221727,issue 614854414,MDExOlB1bGxSZXF1ZXN0NDE1MzI0ODUw,4048,improve to_zarr doc about chunking,11750960,closed,0,,,9,2020-05-08T16:43:09Z,2020-05-20T18:55:38Z,2020-05-20T18:55:33Z,CONTRIBUTOR,,0,pydata/xarray/pulls/4048," - [X] follows #4046 - [X] Passes `isort -rc . && black . && mypy . && flake8` I'm not sure the last point is really necessary for this PR, is it?","{""url"": ""https://api.github.com/repos/pydata/xarray/issues/4048/reactions"", ""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,,13221727,pull 595666886,MDExOlB1bGxSZXF1ZXN0NDAwMTAwMzIz,3944,implement a more threadsafe call to colorbar,11750960,closed,0,,,7,2020-04-07T07:51:28Z,2020-04-09T07:01:12Z,2020-04-09T06:26:57Z,CONTRIBUTOR,,0,pydata/xarray/pulls/3944," - [ ] Xref #1889 - [ ] Tests added - [ ] Passes `isort -rc . && black . && mypy . && flake8` - [ ] Fully documented, including `whats-new.rst` for all changes and `api.rst` for new API If you think this is relevant, I'll go ahead and start working on the items above, even though I'm not sure new tests are needed.","{""url"": ""https://api.github.com/repos/pydata/xarray/issues/3944/reactions"", ""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,,13221727,pull 593825520,MDU6SXNzdWU1OTM4MjU1MjA=,3932,Element wise dataArray generation,11750960,closed,0,,,6,2020-04-04T12:24:16Z,2020-04-07T04:32:12Z,2020-04-07T04:32:12Z,CONTRIBUTOR,,,,"I'm in a situation where I want to generate a bidimensional DataArray from a method that takes each of the two dimensions as input parameters. I have two methods to do this but neither of these looks particularly elegant to me and I wondered whether somebody would have better ideas. - **Method 1** : dask delayed ``` x = np.arange(10) y = np.arange(20) some_exp = lambda x, y: np.ones((Nstats)) some_exp_delayed = dask.delayed(some_exp, pure=True) lazy_data = [some_exp_delayed(_x, _y) for _x in x for _y in y] sample = lazy_data[0].compute() arrays = [da.from_delayed(lazy_value, dtype=sample.dtype, shape=sample.shape) for lazy_value in lazy_data] stack = (da.stack(arrays, axis=0) .reshape((len(x),len(y),sample.size)) ) ds = xr.DataArray(stack, dims=['x','y','stats']) ``` I tend to prefer this option because it imposes less requirement on the `some_exp` data shape. That being said it still seems like too many lines of code to achieve such result. - **Method 2**: apply_ufunc ``` x = np.arange(10) y = np.arange(20) ds = xr.Dataset(coords={'x': x, 'y': y}) ds['_y'] = (0*ds.x+ds.y) # breaks apply_ufunc otherwise ds = ds.chunk({'x': 1, 'y':1}) # let's say each experiment outputs 5 statistical diagnostics Nstats = 5 some_exp = lambda x, y: np.ones((1,1,Nstats)) out = (xr.apply_ufunc(some_exp, ds.x, ds._y, dask='parallelized', output_dtypes=[float], output_sizes={'stats': Nstats}, output_core_dims=[['stats']]) ) ``` I don't understand why I have to use the dummy variable `ds._y` in this case. Having to rely on `apply_ufunc` seems like an overkill.","{""url"": ""https://api.github.com/repos/pydata/xarray/issues/3932/reactions"", ""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,completed,13221727,issue 593860909,MDU6SXNzdWU1OTM4NjA5MDk=,3933,plot.line breaks depending on coordinate shape,11750960,closed,0,,,2,2020-04-04T13:27:35Z,2020-04-04T18:42:10Z,2020-04-04T17:57:20Z,CONTRIBUTOR,,,,"`plot.line` breaks depending on coordinate shape, see the code below: ```python x = np.arange(10) y = np.arange(20) ds = xr.Dataset(coords={'x': x, 'y': y}) #ds = ds.assign_coords(z=ds.y+ds.x) # goes through ds = ds.assign_coords(z=ds.x+ds.y) # breaks ds['v'] = (ds.x+ds.y) ds['v'].plot.line(y='z', hue='x') ``` This breaks with the following error: ``` ... ~/.miniconda3/envs/equinox/lib/python3.7/site-packages/matplotlib/axes/_base.py in _plot_args(self, tup, kwargs) 340 341 if x.shape[0] != y.shape[0]: --> 342 raise ValueError(f""x and y must have same first dimension, but "" 343 f""have shapes {x.shape} and {y.shape}"") 344 if x.ndim > 2 or y.ndim > 2: ValueError: x and y must have same first dimension, but have shapes (20, 10) and (10, 20) ``` I would have expected that that dimension order would not matter #### Versions
Output of `xr.show_versions()` INSTALLED VERSIONS ------------------ commit: None python: 3.7.6 | packaged by conda-forge | (default, Mar 23 2020, 23:03:20) [GCC 7.3.0] python-bits: 64 OS: Linux OS-release: 3.12.53-60.30-default machine: x86_64 processor: x86_64 byteorder: little LC_ALL: None LANG: en_US.UTF-8 LOCALE: en_US.UTF-8 libhdf5: 1.10.5 libnetcdf: 4.7.4 xarray: 0.15.1 pandas: 1.0.3 numpy: 1.18.1 scipy: 1.4.1 netCDF4: 1.5.3 pydap: None h5netcdf: None h5py: None Nio: None zarr: 2.4.0 cftime: 1.1.1.2 nc_time_axis: None PseudoNetCDF: None rasterio: None cfgrib: None iris: None bottleneck: None dask: 2.13.0 distributed: 2.13.0 matplotlib: 3.2.1 cartopy: 0.17.0 seaborn: 0.10.0 numbagg: None setuptools: 46.1.3.post20200325 pip: 20.0.2 conda: None pytest: None IPython: 7.13.0 sphinx: None
","{""url"": ""https://api.github.com/repos/pydata/xarray/issues/3933/reactions"", ""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,completed,13221727,issue 283518232,MDU6SXNzdWUyODM1MTgyMzI=,1795,open_mfdataset concat_dim chunk,11750960,open,0,,,2,2017-12-20T10:34:58Z,2020-01-07T16:19:39Z,,CONTRIBUTOR,,,,"open_mfdataset does not allow chunking along concat_dim. As a result if specific chunking is sought along that dimension by the user it may be best not to pass chunks at the open_mfdataset stage and rechunk variables afterwards. This would be the case for example if chunks are large across files but small within files: https://github.com/apatlpo/lops-array/blob/master/sandbox/natl60_tseries_debug.ipynb I believe this is difficult to anticipate for new users (like me). Couldn't this be specified in the documentation of open_mfdataset? ","{""url"": ""https://api.github.com/repos/pydata/xarray/issues/1795/reactions"", ""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,,13221727,issue 340192831,MDU6SXNzdWUzNDAxOTI4MzE=,2278,can't store zarr after open_zarr and isel,11750960,closed,0,,,10,2018-07-11T10:59:23Z,2019-05-17T14:03:38Z,2018-08-14T03:46:34Z,CONTRIBUTOR,,,,"#### Code Sample, a copy-pastable example if possible This works fine: ```python nx, ny, nt = 32, 32, 64 ds = xr.Dataset({}, coords={'x':np.arange(nx),'y':np.arange(ny), 't': np.arange(nt)}) ds = ds.assign(v=ds.t*np.cos(np.pi/180./100*ds.x)*np.cos(np.pi/180./50*ds.y)) ds = ds.chunk({'t': 1, 'x': nx/2, 'y': ny/2}) ds.isel(t=0).to_zarr('data_t0.zarr', mode='w') ``` But if I store, reload and select, I cannot store: ``` ds.to_zarr('data.zarr', mode='w') ds = xr.open_zarr('data.zarr') ds.isel(t=0).to_zarr('data_t0.zarr', mode='w') ``` Error message ends with: ``` ~/.miniconda3/envs/equinox/lib/python3.6/site-packages/xarray/backends/zarr.py in _extract_zarr_variable_encoding(variable, raise_on_invalid) 181 182 chunks = _determine_zarr_chunks(encoding.get('chunks'), variable.chunks, --> 183 variable.ndim) 184 encoding['chunks'] = chunks 185 return encoding ~/.miniconda3/envs/equinox/lib/python3.6/site-packages/xarray/backends/zarr.py in _determine_zarr_chunks(enc_chunks, var_chunks, ndim) 112 raise ValueError(""zarr chunks tuple %r must have same length as "" 113 ""variable.ndim %g"" % --> 114 (enc_chunks_tuple, ndim)) 115 116 for x in enc_chunks_tuple: ValueError: zarr chunks tuple (1, 16, 16) must have same length as variable.ndim 2 ``` #### Output of ``xr.show_versions()``
INSTALLED VERSIONS ------------------ commit: None python: 3.6.5.final.0 python-bits: 64 OS: Linux OS-release: 3.12.53-60.30-default machine: x86_64 processor: x86_64 byteorder: little LC_ALL: None LANG: en_US.UTF-8 LOCALE: en_US.UTF-8 xarray: 0.10.7 pandas: 0.23.1 numpy: 1.14.2 scipy: 1.1.0 netCDF4: 1.4.0 h5netcdf: 0.6.1 h5py: 2.8.0 Nio: None zarr: 2.2.0 bottleneck: 1.2.1 cyordereddict: None dask: 0.18.1 distributed: 1.22.0 matplotlib: 2.2.2 cartopy: 0.16.0 seaborn: None setuptools: 39.2.0 pip: 10.0.1 conda: None pytest: None IPython: 6.4.0 sphinx: None
","{""url"": ""https://api.github.com/repos/pydata/xarray/issues/2278/reactions"", ""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,completed,13221727,issue 165104458,MDU6SXNzdWUxNjUxMDQ0NTg=,896,mfdataset fails at chunking after opening,11750960,closed,0,,,5,2016-07-12T15:08:34Z,2019-01-27T14:51:58Z,2019-01-27T14:51:58Z,CONTRIBUTOR,,,,"Hi all, We are trying to specify chunks after opening an mfdataset but it does not work. This works fine with datasets. Is this behavior expected? Are we doing anything wrong? ``` # - Modules # import sys, os import xarray as xr chunks = (1727, 2711) xr_chunks = {'x': chunks[-1], 'y': chunks[-2], 'time_counter':1, 'deptht': 1} # - Parameter natl60_path = '/home7/pharos/othr/NATL60/' filename = natl60_path+'NATL60-MJM155-S/5d/2008/NATL60-MJM155_y2008m01d09.5d_gridT.nc' filenames = natl60_path+'NATL60-MJM155-S/5d/2008/NATL60-MJM155_y2008m01d0*gridT.nc' ### dataset # open ds = xr.open_dataset(filename,chunks=None) # chunk ds = ds.chunk(xr_chunks) # plot print 'With dataset:' print ds['votemper'].isel(time_counter=0,deptht=0).values ### mfdataset # open ds = xr.open_mfdataset(filenames,chunks=None, lock=False) # plot print 'With mfdataset no chunks:' print ds['votemper'].isel(time_counter=0,deptht=0).values # chunk print 'With mfdataset with chunks:' ds = ds.chunk(xr_chunks) print ds['votemper'].isel(time_counter=0,deptht=0) print ds['votemper'].isel(time_counter=0,deptht=0).values ``` The output is: ``` With dataset: [[ nan nan nan ..., nan nan nan] [ nan nan nan ..., nan nan nan] [ nan nan nan ..., nan nan nan] ..., [ nan nan nan ..., nan nan nan] [ nan nan nan ..., nan nan nan] [ nan nan nan ..., nan nan nan]] With mfdataset no chunks: [[ nan nan nan ..., nan nan nan] [ nan nan nan ..., nan nan nan] [ nan nan nan ..., nan nan nan] ..., [ nan nan nan ..., nan nan nan] [ nan nan nan ..., nan nan nan] [ nan nan nan ..., nan nan nan]] With mfdataset with chunks: dask.array Coordinates: nav_lat (y, x) float32 26.5648 26.5648 26.5648 26.5648 26.5648 ... nav_lon (y, x) float32 -81.4512 -81.4346 -81.4179 -81.4012 ... deptht float32 0.480455 * x (x) int64 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 ... * y (y) int64 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 ... time_counter datetime64[ns] 2008-01-02T12:00:00 time_centered datetime64[ns] 2008-01-02T12:00:00 Attributes: long_name: temperature units: degC online_operation: average interval_operation: 40s interval_write: 5d ``` The code hangs for a while and then spits: ``` Traceback (most recent call last): File ""/home/slyne/aponte/natl60/python/natl60_dimup/overview/aurelien/plot_snapshot_2d_v4_break.py"", line 44, in print ds['votemper'].isel(time_counter=0,deptht=0).values File ""/home1/homedir5/perso/aponte/miniconda2/envs/natl60/lib/python2.7/site-packages/xarray/core/dataarray.py"", line 364, in values return self.variable.values File ""/home1/homedir5/perso/aponte/miniconda2/envs/natl60/lib/python2.7/site-packages/xarray/core/variable.py"", line 288, in values return _as_array_or_item(self._data_cached()) File ""/home1/homedir5/perso/aponte/miniconda2/envs/natl60/lib/python2.7/site-packages/xarray/core/variable.py"", line 254, in _data_cached self._data = np.asarray(self._data) File ""/home1/homedir5/perso/aponte/miniconda2/envs/natl60/lib/python2.7/site-packages/numpy/core/numeric.py"", line 460, in asarray return array(a, dtype, copy=False, order=order) File ""/home1/homedir5/perso/aponte/miniconda2/envs/natl60/lib/python2.7/site-packages/dask/array/core.py"", line 867, in __array__ x = self.compute() File ""/home1/homedir5/perso/aponte/miniconda2/envs/natl60/lib/python2.7/site-packages/dask/base.py"", line 37, in compute return compute(self, **kwargs)[0] File ""/home1/homedir5/perso/aponte/miniconda2/envs/natl60/lib/python2.7/site-packages/dask/base.py"", line 110, in compute results = get(dsk, keys, **kwargs) File ""/home1/homedir5/perso/aponte/miniconda2/envs/natl60/lib/python2.7/site-packages/dask/threaded.py"", line 57, in get **kwargs) File ""/home1/homedir5/perso/aponte/miniconda2/envs/natl60/lib/python2.7/site-packages/dask/async.py"", line 481, in get_async raise(remote_exception(res, tb)) dask.async.MemoryError: Traceback --------- File ""/home1/homedir5/perso/aponte/miniconda2/envs/natl60/lib/python2.7/site-packages/dask/async.py"", line 264, in execute_task result = _execute_task(task, data) File ""/home1/homedir5/perso/aponte/miniconda2/envs/natl60/lib/python2.7/site-packages/dask/async.py"", line 245, in _execute_task args2 = [_execute_task(a, cache) for a in args] File ""/home1/homedir5/perso/aponte/miniconda2/envs/natl60/lib/python2.7/site-packages/dask/async.py"", line 245, in _execute_task args2 = [_execute_task(a, cache) for a in args] File ""/home1/homedir5/perso/aponte/miniconda2/envs/natl60/lib/python2.7/site-packages/dask/async.py"", line 242, in _execute_task return [_execute_task(a, cache) for a in arg] File ""/home1/homedir5/perso/aponte/miniconda2/envs/natl60/lib/python2.7/site-packages/dask/async.py"", line 242, in _execute_task return [_execute_task(a, cache) for a in arg] File ""/home1/homedir5/perso/aponte/miniconda2/envs/natl60/lib/python2.7/site-packages/dask/async.py"", line 242, in _execute_task return [_execute_task(a, cache) for a in arg] File ""/home1/homedir5/perso/aponte/miniconda2/envs/natl60/lib/python2.7/site-packages/dask/async.py"", line 242, in _execute_task return [_execute_task(a, cache) for a in arg] File ""/home1/homedir5/perso/aponte/miniconda2/envs/natl60/lib/python2.7/site-packages/dask/async.py"", line 245, in _execute_task args2 = [_execute_task(a, cache) for a in args] File ""/home1/homedir5/perso/aponte/miniconda2/envs/natl60/lib/python2.7/site-packages/dask/async.py"", line 246, in _execute_task return func(*args2) File ""/home1/homedir5/perso/aponte/miniconda2/envs/natl60/lib/python2.7/site-packages/dask/array/core.py"", line 50, in getarray c = np.asarray(c) File ""/home1/homedir5/perso/aponte/miniconda2/envs/natl60/lib/python2.7/site-packages/numpy/core/numeric.py"", line 460, in asarray return array(a, dtype, copy=False, order=order) File ""/home1/homedir5/perso/aponte/miniconda2/envs/natl60/lib/python2.7/site-packages/xarray/core/indexing.py"", line 312, in __array__ return np.asarray(array[self.key], dtype=None) File ""/home1/homedir5/perso/aponte/miniconda2/envs/natl60/lib/python2.7/site-packages/xarray/conventions.py"", line 359, in __getitem__ self.scale_factor, self.add_offset, self._dtype) File ""/home1/homedir5/perso/aponte/miniconda2/envs/natl60/lib/python2.7/site-packages/xarray/conventions.py"", line 57, in mask_and_scale values = np.array(array, dtype=dtype, copy=True) ``` Cheers Aurelien ","{""url"": ""https://api.github.com/repos/pydata/xarray/issues/896/reactions"", ""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,completed,13221727,issue 373449569,MDU6SXNzdWUzNzM0NDk1Njk=,2504,isel slows down computation significantly after open_dataset,11750960,closed,0,,,3,2018-10-24T12:09:18Z,2018-10-25T19:12:06Z,2018-10-25T19:12:06Z,CONTRIBUTOR,,,,"isel significantly slows down a simple mean calculation: ```python ds = xr.open_dataset(grid_dir_nc+'Depth.nc', chunks={'face':1}) print(ds) % time print(ds.Depth.mean().values) ``` leads to: ``` Dimensions: (face: 13, i: 4320, j: 4320) Coordinates: * i (i) int64 0 1 2 3 4 5 6 7 ... 4313 4314 4315 4316 4317 4318 4319 * j (j) int64 0 1 2 3 4 5 6 7 ... 4313 4314 4315 4316 4317 4318 4319 * face (face) int64 0 1 2 3 4 5 6 7 8 9 10 11 12 Data variables: Depth (face, j, i) float32 dask.array 1935.0237 CPU times: user 241 ms, sys: 16.9 ms, total: 258 ms Wall time: 1.05 s ``` ``` ds = xr.open_dataset(grid_dir_nc+'Depth.nc', chunks={'face':1}) ds = ds.isel(i=slice(None,None,4),j=slice(None,None,4)) % time print(ds.Depth.mean().values) ``` leads to: ``` 1935.0199 CPU times: user 9.43 s, sys: 819 ms, total: 10.3 s Wall time: 2min 57s ``` Is this expected behavior? #### Output of ``xr.show_versions()`` I am using latest xarray version (`pip install https://github.com/pydata/xarray/archive/master.zip`)
INSTALLED VERSIONS ------------------ commit: None python: 3.6.6.final.0 python-bits: 64 OS: Linux OS-release: 3.10.0-862.2.3.el7.x86_64 machine: x86_64 processor: x86_64 byteorder: little LC_ALL: None LANG: en_US.UTF-8 LOCALE: en_US.UTF-8 xarray: 0+unknown pandas: 0.23.4 numpy: 1.15.3 scipy: 1.1.0 netCDF4: 1.4.1 h5netcdf: None h5py: None Nio: None zarr: 2.2.0 cftime: 1.0.1 PseudonetCDF: None rasterio: None iris: None bottleneck: None cyordereddict: None dask: 0.19.2 distributed: 1.23.2 matplotlib: 2.2.2 cartopy: 0.16.0 seaborn: None setuptools: 40.2.0 pip: 10.0.1 conda: None pytest: None IPython: 6.5.0 sphinx: None
","{""url"": ""https://api.github.com/repos/pydata/xarray/issues/2504/reactions"", ""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,completed,13221727,issue 323333361,MDU6SXNzdWUzMjMzMzMzNjE=,2132,to_netcdf - RuntimeError: NetCDF: HDF error,11750960,closed,0,,,3,2018-05-15T18:31:49Z,2018-05-16T19:50:52Z,2018-05-16T18:52:59Z,CONTRIBUTOR,,,,"I am trying to store data to a netcdf file, and have issues: Data is created according to: ```python import numpy as np import xarray as xr i = np.arange(4320) j = np.arange(4320) face = np.arange(13) v = xr.DataArray(np.random.randn(face.size, j.size, i.size), \ coords={'i': i, 'j': j, 'face': face}, dims=['face','j','i']) ``` The following works: ``` file_out = 'rand.nc' v.to_netcdf(file_out) ``` there is a minor warning: ``` /home1/datahome/aponte/.miniconda3/envs/equinox/lib/python3.6/site-packages/distributed/utils.py:128: RuntimeWarning: Couldn't detect a suitable IP address for reaching '8.8.8.8', defaulting to '127.0.0.1': [Errno 101] Network is unreachable % (host, default, e), RuntimeWarning) ``` But this does not work: ``` file_out = '/home1/datawork/aponte/mit_tmp/rand.nc' v.to_netcdf(file_out) ``` with the following error message:
``` --------------------------------------------------------------------------- RuntimeError Traceback (most recent call last) ~/.miniconda3/envs/equinox/lib/python3.6/site-packages/xarray/backends/api.py in to_netcdf(dataset, path_or_file, mode, format, group, engine, writer, encoding, unlimited_dims) 656 dataset.dump_to_store(store, sync=sync, encoding=encoding, --> 657 unlimited_dims=unlimited_dims) 658 if path_or_file is None: ~/.miniconda3/envs/equinox/lib/python3.6/site-packages/xarray/core/dataset.py in dump_to_store(self, store, encoder, sync, encoding, unlimited_dims) 1073 store.store(variables, attrs, check_encoding, -> 1074 unlimited_dims=unlimited_dims) 1075 if sync: ~/.miniconda3/envs/equinox/lib/python3.6/site-packages/xarray/backends/common.py in store(self, variables, attributes, check_encoding_set, unlimited_dims) 362 self.set_variables(variables, check_encoding_set, --> 363 unlimited_dims=unlimited_dims) 364 ~/.miniconda3/envs/equinox/lib/python3.6/site-packages/xarray/backends/netCDF4_.py in set_variables(self, *args, **kwargs) 353 with self.ensure_open(autoclose=False): --> 354 super(NetCDF4DataStore, self).set_variables(*args, **kwargs) 355 ~/.miniconda3/envs/equinox/lib/python3.6/site-packages/xarray/backends/common.py in set_variables(self, variables, check_encoding_set, unlimited_dims) 401 --> 402 self.writer.add(source, target) 403 ~/.miniconda3/envs/equinox/lib/python3.6/site-packages/xarray/backends/common.py in add(self, source, target) 264 else: --> 265 target[...] = source 266 ~/.miniconda3/envs/equinox/lib/python3.6/site-packages/xarray/backends/netCDF4_.py in __setitem__(self, key, value) 46 data = self.get_array() ---> 47 data[key] = value 48 netCDF4/_netCDF4.pyx in netCDF4._netCDF4.Variable.__setitem__() netCDF4/_netCDF4.pyx in netCDF4._netCDF4.Variable._put() netCDF4/_netCDF4.pyx in netCDF4._netCDF4._ensure_nc_success() RuntimeError: NetCDF: HDF error During handling of the above exception, another exception occurred: RuntimeError Traceback (most recent call last) in () 2 if os.path.isfile(file_out): 3 os.remove(file_out) ----> 4 v.to_netcdf(file_out) ~/.miniconda3/envs/equinox/lib/python3.6/site-packages/xarray/core/dataarray.py in to_netcdf(self, *args, **kwargs) 1515 dataset = self.to_dataset() 1516 -> 1517 return dataset.to_netcdf(*args, **kwargs) 1518 1519 def to_dict(self): ~/.miniconda3/envs/equinox/lib/python3.6/site-packages/xarray/core/dataset.py in to_netcdf(self, path, mode, format, group, engine, encoding, unlimited_dims) 1135 return to_netcdf(self, path, mode, format=format, group=group, 1136 engine=engine, encoding=encoding, -> 1137 unlimited_dims=unlimited_dims) 1138 1139 def to_zarr(self, store=None, mode='w-', synchronizer=None, group=None, ~/.miniconda3/envs/equinox/lib/python3.6/site-packages/xarray/backends/api.py in to_netcdf(dataset, path_or_file, mode, format, group, engine, writer, encoding, unlimited_dims) 660 finally: 661 if sync and isinstance(path_or_file, basestring): --> 662 store.close() 663 664 if not sync: ~/.miniconda3/envs/equinox/lib/python3.6/site-packages/xarray/backends/netCDF4_.py in close(self) 419 ds = find_root(self.ds) 420 if ds._isopen: --> 421 ds.close() 422 self._isopen = False netCDF4/_netCDF4.pyx in netCDF4._netCDF4.Dataset.close() netCDF4/_netCDF4.pyx in netCDF4._netCDF4.Dataset._close() netCDF4/_netCDF4.pyx in netCDF4._netCDF4._ensure_nc_success() RuntimeError: NetCDF: HDF error ```
The following may of some use: ``` (equinox) aponte@datarmor1:~/mit_equinox/sandbox> stat -f -L -c %T /home1/datawork/aponte/mit_tmp/ gpfs (equinox) aponte@datarmor1:~/mit_equinox/sandbox> stat -f -L -c %T . nfs ``` (the `.`directory being where the notebook seats) #### Output of ``xr.show_versions()``
# Paste the output here xr.show_versions() here ``` /home1/datahome/aponte/.miniconda3/envs/equinox/lib/python3.6/site-packages/h5py/__init__.py:36: FutureWarning: Conversion of the second argument of issubdtype from `float` to `np.floating` is deprecated. In future, it will be treated as `np.float64 == np.dtype(float).type`. from ._conv import register_converters as _register_converters INSTALLED VERSIONS ------------------ commit: None python: 3.6.5.final.0 python-bits: 64 OS: Linux OS-release: 3.12.53-60.30-default machine: x86_64 processor: x86_64 byteorder: little LC_ALL: None LANG: en_US.UTF-8 LOCALE: en_US.UTF-8 xarray: 0.10.3 pandas: 0.22.0 numpy: 1.14.2 scipy: 1.0.1 netCDF4: 1.3.1 h5netcdf: 0.5.1 h5py: 2.7.1 Nio: None zarr: None bottleneck: 1.2.1 cyordereddict: None dask: 0.17.2 distributed: 1.21.6 matplotlib: 2.2.2 cartopy: 0.16.0 seaborn: None setuptools: 39.0.1 pip: 9.0.3 conda: None pytest: None IPython: 6.3.1 sphinx: None ```
","{""url"": ""https://api.github.com/repos/pydata/xarray/issues/2132/reactions"", ""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,completed,13221727,issue