id,node_id,number,title,user,state,locked,assignee,milestone,comments,created_at,updated_at,closed_at,author_association,active_lock_reason,draft,pull_request,body,reactions,performed_via_github_app,state_reason,repo,type
1589771368,I_kwDOAMm_X85ewfxo,7541,standard deviation over one dimension of a chunked DataArray leads to NaN,11750960,closed,0,,,2,2023-02-17T18:07:06Z,2023-02-17T18:12:10Z,2023-02-17T18:12:10Z,CONTRIBUTOR,,,,"### What happened?
When computing the standard deviation over one dimension of a chunked DataArray, one may get NaNs.
### What did you expect to happen?
We should not have any NaNs
### Minimal Complete Verifiable Example
```Python
x = (np.random.randn(10,10) + 1j*np.random.randn(10,10))
da = xr.DataArray(x).chunk(dict(dim_0=3))
da.std(""dim_0"").compute() # NaN
da.compute().std(""dim_0"") # no NaNs
```
```
### MVCE confirmation
- [X] Minimal example — the example is as focused as reasonably possible to demonstrate the underlying issue in xarray.
- [X] Complete example — the example is self-contained, including all data and the text of any traceback.
- [X] Verifiable example — the example copy & pastes into an IPython prompt or [Binder notebook](https://mybinder.org/v2/gh/pydata/xarray/main?urlpath=lab/tree/doc/examples/blank_template.ipynb), returning the result.
- [X] New issue — a search of GitHub Issues suggests this is not a duplicate.
### Relevant log output
_No response_
### Anything else we need to know?
_No response_
### Environment
INSTALLED VERSIONS
------------------
commit: None
python: 3.10.8 (main, Nov 24 2022, 08:09:04) [Clang 14.0.6 ]
python-bits: 64
OS: Darwin
OS-release: 22.3.0
machine: x86_64
processor: i386
byteorder: little
LC_ALL: None
LANG: fr_FR.UTF-8
LOCALE: ('fr_FR', 'UTF-8')
libhdf5: 1.12.2
libnetcdf: 4.8.1
xarray: 2022.12.0
pandas: 1.5.2
numpy: 1.23.5
scipy: 1.9.3
netCDF4: 1.6.2
pydap: None
h5netcdf: None
h5py: 3.7.0
Nio: None
zarr: 2.13.3
cftime: 1.5.1.1
nc_time_axis: None
PseudoNetCDF: None
rasterio: 1.3.4
cfgrib: None
iris: None
bottleneck: 1.3.5
dask: 2022.02.1
distributed: 2022.2.1
matplotlib: 3.6.2
cartopy: 0.21.1
seaborn: 0.12.1
numbagg: None
fsspec: 2022.11.0
cupy: None
pint: None
sparse: None
flox: None
numpy_groupies: None
setuptools: 65.5.0
pip: 22.3.1
conda: None
pytest: None
mypy: None
IPython: 8.7.0
sphinx: None
","{""url"": ""https://api.github.com/repos/pydata/xarray/issues/7541/reactions"", ""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,completed,13221727,issue
614785886,MDU6SXNzdWU2MTQ3ODU4ODY=,4046,automatic chunking of zarr archive,11750960,open,0,,,3,2020-05-08T14:42:00Z,2023-01-18T21:54:42Z,,CONTRIBUTOR,,,,"I store data in a zarr archive that is not chunked and the resulting zarr archive is chunked.
This may be as simple usage question.
I don't know how to turn this behavior off.
#### Code sample
Here is minimal example that reproduces the issue:
```python
ds = xr.DataArray(np.ones((200,800))).rename('foo').to_dataset()
print('Initial chunks = {}'.format(ds.foo.chunks))
ds.to_zarr('test.zarr', mode='w')
print('zarr archives contains: {}'.format(os.listdir('test.zarr/foo')))
ds = xr.open_zarr('test.zarr')
print('Final chunks = {}'.format(ds.foo.chunks))
```
returns:
```
Initial chunks = None
zarr archives contains: ['.zarray', '.zattrs', '0.0', '0.1', '1.0', '1.1']
Final chunks = ((100, 100), (400, 400))
```
#### Expected Output
I would expect the archive to not to be chunked.
#### Versions
Output of xr.show_versions()
INSTALLED VERSIONS
------------------
commit: None
python: 3.7.6 | packaged by conda-forge | (default, Mar 23 2020, 23:03:20)
[GCC 7.3.0]
python-bits: 64
OS: Linux
OS-release: 3.12.53-60.30-default
machine: x86_64
processor: x86_64
byteorder: little
LC_ALL: None
LANG: en_US.UTF-8
LOCALE: en_US.UTF-8
libhdf5: 1.10.5
libnetcdf: 4.7.4
xarray: 0.15.2.dev29+g6048356
pandas: 1.0.3
numpy: 1.18.1
scipy: 1.4.1
netCDF4: 1.5.3
pydap: None
h5netcdf: None
h5py: None
Nio: None
zarr: 2.4.0
cftime: 1.1.1.2
nc_time_axis: None
PseudoNetCDF: None
rasterio: None
cfgrib: None
iris: None
bottleneck: None
dask: 2.13.0
distributed: 2.13.0
matplotlib: 3.2.1
cartopy: 0.17.0
seaborn: 0.10.0
numbagg: None
pint: None
setuptools: 46.1.3.post20200325
pip: 20.0.2
conda: None
pytest: None
IPython: 7.13.0
sphinx: None
","{""url"": ""https://api.github.com/repos/pydata/xarray/issues/4046/reactions"", ""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,,13221727,issue
294735496,MDU6SXNzdWUyOTQ3MzU0OTY=,1889,call to colorbar not thread safe,11750960,closed,0,,,12,2018-02-06T12:05:44Z,2022-04-27T23:47:56Z,2022-04-27T23:47:56Z,CONTRIBUTOR,,,,"The following call in `xarray/xarray/plot/plot.py` does not seem to be thread safe:
```
cbar = plt.colorbar(primitive, **cbar_kwargs)
```
It leads to systematic crashes when distributed, with a cryptic error message (`ValueError: Unknown element o`). I have to call colorbars outside the xarray plot call to prevent crashes.
A call of the following type may fix the problem:
```
cbar = fig.colorbar(primitive, **cbar_kwargs)
```
But `fig` does not seem to be available directly in plot.py. Maybe:
```
cbar = ax.get_figure().colorbar(primitive, **cbar_kwargs)
```
cheers","{""url"": ""https://api.github.com/repos/pydata/xarray/issues/1889/reactions"", ""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,completed,13221727,issue
667763555,MDU6SXNzdWU2Njc3NjM1NTU=,4284,overwriting netcdf file fails at read time,11750960,closed,0,,,1,2020-07-29T11:17:10Z,2020-08-01T20:54:16Z,2020-08-01T20:54:16Z,CONTRIBUTOR,,,,"I generate a dataset once:
```
ds = xr.DataArray(np.arange(10), name='x').to_dataset()
ds.to_netcdf('test.nc', mode='w')
```
Now I overwrite with a new netcdf file and load:
```
ds = xr.DataArray(np.arange(20), name='x').to_dataset()
ds.to_netcdf('test.nc', mode='w')
ds_out = xr.open_dataset('test.nc')
print(ds_out)
```
outputs:
```
Dimensions: (dim_0: 10)
Dimensions without coordinates: dim_0
Data variables:
x (dim_0) int64 ...
```
I would have expected to get the new dataset.
If I use netcdf4, the file seems to have been properly overwritten:
```
import netCDF4 as nc
d = nc.Dataset('test.nc')
d
```
outputs:
```
root group (NETCDF4 data model, file format HDF5):
dimensions(sizes): dim_0(20)
variables(dimensions): int64 x(dim_0)
groups:
```
**Environment**:
Output of xr.show_versions()
INSTALLED VERSIONS
------------------
commit: None
python: 3.7.6 | packaged by conda-forge | (default, Mar 23 2020, 23:03:20)
[GCC 7.3.0]
python-bits: 64
OS: Linux
OS-release: 3.12.53-60.30-default
machine: x86_64
processor: x86_64
byteorder: little
LC_ALL: None
LANG: en_US.UTF-8
LOCALE: en_US.UTF-8
libhdf5: 1.10.5
libnetcdf: 4.7.4
xarray: 0.15.1
pandas: 1.0.3
numpy: 1.18.1
scipy: 1.4.1
netCDF4: 1.5.3
pydap: None
h5netcdf: None
h5py: None
Nio: None
zarr: 2.4.0
cftime: 1.1.1.2
nc_time_axis: None
PseudoNetCDF: None
rasterio: None
cfgrib: None
iris: None
bottleneck: None
dask: 2.13.0
distributed: 2.13.0
matplotlib: 3.3.0
cartopy: 0.17.0
seaborn: 0.10.0
numbagg: None
setuptools: 46.1.3.post20200325
pip: 20.0.2
conda: None
pytest: None
IPython: 7.13.0
sphinx: None
","{""url"": ""https://api.github.com/repos/pydata/xarray/issues/4284/reactions"", ""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,completed,13221727,issue
614854414,MDExOlB1bGxSZXF1ZXN0NDE1MzI0ODUw,4048,improve to_zarr doc about chunking,11750960,closed,0,,,9,2020-05-08T16:43:09Z,2020-05-20T18:55:38Z,2020-05-20T18:55:33Z,CONTRIBUTOR,,0,pydata/xarray/pulls/4048," - [X] follows #4046
- [X] Passes `isort -rc . && black . && mypy . && flake8`
I'm not sure the last point is really necessary for this PR, is it?","{""url"": ""https://api.github.com/repos/pydata/xarray/issues/4048/reactions"", ""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,,13221727,pull
595666886,MDExOlB1bGxSZXF1ZXN0NDAwMTAwMzIz,3944,implement a more threadsafe call to colorbar,11750960,closed,0,,,7,2020-04-07T07:51:28Z,2020-04-09T07:01:12Z,2020-04-09T06:26:57Z,CONTRIBUTOR,,0,pydata/xarray/pulls/3944," - [ ] Xref #1889
- [ ] Tests added
- [ ] Passes `isort -rc . && black . && mypy . && flake8`
- [ ] Fully documented, including `whats-new.rst` for all changes and `api.rst` for new API
If you think this is relevant, I'll go ahead and start working on the items above, even though I'm not sure new tests are needed.","{""url"": ""https://api.github.com/repos/pydata/xarray/issues/3944/reactions"", ""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,,13221727,pull
593825520,MDU6SXNzdWU1OTM4MjU1MjA=,3932,Element wise dataArray generation,11750960,closed,0,,,6,2020-04-04T12:24:16Z,2020-04-07T04:32:12Z,2020-04-07T04:32:12Z,CONTRIBUTOR,,,,"I'm in a situation where I want to generate a bidimensional DataArray from a method that takes each of the two dimensions as input parameters.
I have two methods to do this but neither of these looks particularly elegant to me and I wondered whether somebody would have better ideas.
- **Method 1** : dask delayed
```
x = np.arange(10)
y = np.arange(20)
some_exp = lambda x, y: np.ones((Nstats))
some_exp_delayed = dask.delayed(some_exp, pure=True)
lazy_data = [some_exp_delayed(_x, _y) for _x in x for _y in y]
sample = lazy_data[0].compute()
arrays = [da.from_delayed(lazy_value,
dtype=sample.dtype,
shape=sample.shape)
for lazy_value in lazy_data]
stack = (da.stack(arrays, axis=0)
.reshape((len(x),len(y),sample.size))
)
ds = xr.DataArray(stack, dims=['x','y','stats'])
```
I tend to prefer this option because it imposes less requirement on the `some_exp` data shape.
That being said it still seems like too many lines of code to achieve such result.
- **Method 2**: apply_ufunc
```
x = np.arange(10)
y = np.arange(20)
ds = xr.Dataset(coords={'x': x, 'y': y})
ds['_y'] = (0*ds.x+ds.y) # breaks apply_ufunc otherwise
ds = ds.chunk({'x': 1, 'y':1})
# let's say each experiment outputs 5 statistical diagnostics
Nstats = 5
some_exp = lambda x, y: np.ones((1,1,Nstats))
out = (xr.apply_ufunc(some_exp, ds.x, ds._y,
dask='parallelized',
output_dtypes=[float],
output_sizes={'stats': Nstats},
output_core_dims=[['stats']])
)
```
I don't understand why I have to use the dummy variable `ds._y` in this case.
Having to rely on `apply_ufunc` seems like an overkill.","{""url"": ""https://api.github.com/repos/pydata/xarray/issues/3932/reactions"", ""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,completed,13221727,issue
593860909,MDU6SXNzdWU1OTM4NjA5MDk=,3933,plot.line breaks depending on coordinate shape,11750960,closed,0,,,2,2020-04-04T13:27:35Z,2020-04-04T18:42:10Z,2020-04-04T17:57:20Z,CONTRIBUTOR,,,,"`plot.line` breaks depending on coordinate shape, see the code below:
```python
x = np.arange(10)
y = np.arange(20)
ds = xr.Dataset(coords={'x': x, 'y': y})
#ds = ds.assign_coords(z=ds.y+ds.x) # goes through
ds = ds.assign_coords(z=ds.x+ds.y) # breaks
ds['v'] = (ds.x+ds.y)
ds['v'].plot.line(y='z', hue='x')
```
This breaks with the following error:
```
...
~/.miniconda3/envs/equinox/lib/python3.7/site-packages/matplotlib/axes/_base.py in _plot_args(self, tup, kwargs)
340
341 if x.shape[0] != y.shape[0]:
--> 342 raise ValueError(f""x and y must have same first dimension, but ""
343 f""have shapes {x.shape} and {y.shape}"")
344 if x.ndim > 2 or y.ndim > 2:
ValueError: x and y must have same first dimension, but have shapes (20, 10) and (10, 20)
```
I would have expected that that dimension order would not matter
#### Versions
Output of `xr.show_versions()`
INSTALLED VERSIONS
------------------
commit: None
python: 3.7.6 | packaged by conda-forge | (default, Mar 23 2020, 23:03:20)
[GCC 7.3.0]
python-bits: 64
OS: Linux
OS-release: 3.12.53-60.30-default
machine: x86_64
processor: x86_64
byteorder: little
LC_ALL: None
LANG: en_US.UTF-8
LOCALE: en_US.UTF-8
libhdf5: 1.10.5
libnetcdf: 4.7.4
xarray: 0.15.1
pandas: 1.0.3
numpy: 1.18.1
scipy: 1.4.1
netCDF4: 1.5.3
pydap: None
h5netcdf: None
h5py: None
Nio: None
zarr: 2.4.0
cftime: 1.1.1.2
nc_time_axis: None
PseudoNetCDF: None
rasterio: None
cfgrib: None
iris: None
bottleneck: None
dask: 2.13.0
distributed: 2.13.0
matplotlib: 3.2.1
cartopy: 0.17.0
seaborn: 0.10.0
numbagg: None
setuptools: 46.1.3.post20200325
pip: 20.0.2
conda: None
pytest: None
IPython: 7.13.0
sphinx: None
","{""url"": ""https://api.github.com/repos/pydata/xarray/issues/3933/reactions"", ""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,completed,13221727,issue
283518232,MDU6SXNzdWUyODM1MTgyMzI=,1795,open_mfdataset concat_dim chunk,11750960,open,0,,,2,2017-12-20T10:34:58Z,2020-01-07T16:19:39Z,,CONTRIBUTOR,,,,"open_mfdataset does not allow chunking along concat_dim.
As a result if specific chunking is sought along that dimension by the user it may be best not to pass chunks at the open_mfdataset stage and rechunk variables afterwards.
This would be the case for example if chunks are large across files but small within files:
https://github.com/apatlpo/lops-array/blob/master/sandbox/natl60_tseries_debug.ipynb
I believe this is difficult to anticipate for new users (like me).
Couldn't this be specified in the documentation of open_mfdataset?
","{""url"": ""https://api.github.com/repos/pydata/xarray/issues/1795/reactions"", ""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,,13221727,issue
340192831,MDU6SXNzdWUzNDAxOTI4MzE=,2278,can't store zarr after open_zarr and isel,11750960,closed,0,,,10,2018-07-11T10:59:23Z,2019-05-17T14:03:38Z,2018-08-14T03:46:34Z,CONTRIBUTOR,,,,"#### Code Sample, a copy-pastable example if possible
This works fine:
```python
nx, ny, nt = 32, 32, 64
ds = xr.Dataset({}, coords={'x':np.arange(nx),'y':np.arange(ny), 't': np.arange(nt)})
ds = ds.assign(v=ds.t*np.cos(np.pi/180./100*ds.x)*np.cos(np.pi/180./50*ds.y))
ds = ds.chunk({'t': 1, 'x': nx/2, 'y': ny/2})
ds.isel(t=0).to_zarr('data_t0.zarr', mode='w')
```
But if I store, reload and select, I cannot store:
```
ds.to_zarr('data.zarr', mode='w')
ds = xr.open_zarr('data.zarr')
ds.isel(t=0).to_zarr('data_t0.zarr', mode='w')
```
Error message ends with:
```
~/.miniconda3/envs/equinox/lib/python3.6/site-packages/xarray/backends/zarr.py in _extract_zarr_variable_encoding(variable, raise_on_invalid)
181
182 chunks = _determine_zarr_chunks(encoding.get('chunks'), variable.chunks,
--> 183 variable.ndim)
184 encoding['chunks'] = chunks
185 return encoding
~/.miniconda3/envs/equinox/lib/python3.6/site-packages/xarray/backends/zarr.py in _determine_zarr_chunks(enc_chunks, var_chunks, ndim)
112 raise ValueError(""zarr chunks tuple %r must have same length as ""
113 ""variable.ndim %g"" %
--> 114 (enc_chunks_tuple, ndim))
115
116 for x in enc_chunks_tuple:
ValueError: zarr chunks tuple (1, 16, 16) must have same length as variable.ndim 2
```
#### Output of ``xr.show_versions()``
INSTALLED VERSIONS
------------------
commit: None
python: 3.6.5.final.0
python-bits: 64
OS: Linux
OS-release: 3.12.53-60.30-default
machine: x86_64
processor: x86_64
byteorder: little
LC_ALL: None
LANG: en_US.UTF-8
LOCALE: en_US.UTF-8
xarray: 0.10.7
pandas: 0.23.1
numpy: 1.14.2
scipy: 1.1.0
netCDF4: 1.4.0
h5netcdf: 0.6.1
h5py: 2.8.0
Nio: None
zarr: 2.2.0
bottleneck: 1.2.1
cyordereddict: None
dask: 0.18.1
distributed: 1.22.0
matplotlib: 2.2.2
cartopy: 0.16.0
seaborn: None
setuptools: 39.2.0
pip: 10.0.1
conda: None
pytest: None
IPython: 6.4.0
sphinx: None
","{""url"": ""https://api.github.com/repos/pydata/xarray/issues/2278/reactions"", ""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,completed,13221727,issue
165104458,MDU6SXNzdWUxNjUxMDQ0NTg=,896,mfdataset fails at chunking after opening,11750960,closed,0,,,5,2016-07-12T15:08:34Z,2019-01-27T14:51:58Z,2019-01-27T14:51:58Z,CONTRIBUTOR,,,,"Hi all,
We are trying to specify chunks after opening an mfdataset but it does not work.
This works fine with datasets.
Is this behavior expected?
Are we doing anything wrong?
```
# - Modules
#
import sys, os
import xarray as xr
chunks = (1727, 2711)
xr_chunks = {'x': chunks[-1], 'y': chunks[-2], 'time_counter':1, 'deptht': 1}
# - Parameter
natl60_path = '/home7/pharos/othr/NATL60/'
filename = natl60_path+'NATL60-MJM155-S/5d/2008/NATL60-MJM155_y2008m01d09.5d_gridT.nc'
filenames = natl60_path+'NATL60-MJM155-S/5d/2008/NATL60-MJM155_y2008m01d0*gridT.nc'
### dataset
# open
ds = xr.open_dataset(filename,chunks=None)
# chunk
ds = ds.chunk(xr_chunks)
# plot
print 'With dataset:'
print ds['votemper'].isel(time_counter=0,deptht=0).values
### mfdataset
# open
ds = xr.open_mfdataset(filenames,chunks=None, lock=False)
# plot
print 'With mfdataset no chunks:'
print ds['votemper'].isel(time_counter=0,deptht=0).values
# chunk
print 'With mfdataset with chunks:'
ds = ds.chunk(xr_chunks)
print ds['votemper'].isel(time_counter=0,deptht=0)
print ds['votemper'].isel(time_counter=0,deptht=0).values
```
The output is:
```
With dataset:
[[ nan nan nan ..., nan nan nan]
[ nan nan nan ..., nan nan nan]
[ nan nan nan ..., nan nan nan]
...,
[ nan nan nan ..., nan nan nan]
[ nan nan nan ..., nan nan nan]
[ nan nan nan ..., nan nan nan]]
With mfdataset no chunks:
[[ nan nan nan ..., nan nan nan]
[ nan nan nan ..., nan nan nan]
[ nan nan nan ..., nan nan nan]
...,
[ nan nan nan ..., nan nan nan]
[ nan nan nan ..., nan nan nan]
[ nan nan nan ..., nan nan nan]]
With mfdataset with chunks:
dask.array
Coordinates:
nav_lat (y, x) float32 26.5648 26.5648 26.5648 26.5648 26.5648 ...
nav_lon (y, x) float32 -81.4512 -81.4346 -81.4179 -81.4012 ...
deptht float32 0.480455
* x (x) int64 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 ...
* y (y) int64 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 ...
time_counter datetime64[ns] 2008-01-02T12:00:00
time_centered datetime64[ns] 2008-01-02T12:00:00
Attributes:
long_name: temperature
units: degC
online_operation: average
interval_operation: 40s
interval_write: 5d
```
The code hangs for a while and then spits:
```
Traceback (most recent call last):
File ""/home/slyne/aponte/natl60/python/natl60_dimup/overview/aurelien/plot_snapshot_2d_v4_break.py"", line 44, in
print ds['votemper'].isel(time_counter=0,deptht=0).values
File ""/home1/homedir5/perso/aponte/miniconda2/envs/natl60/lib/python2.7/site-packages/xarray/core/dataarray.py"", line 364, in values
return self.variable.values
File ""/home1/homedir5/perso/aponte/miniconda2/envs/natl60/lib/python2.7/site-packages/xarray/core/variable.py"", line 288, in values
return _as_array_or_item(self._data_cached())
File ""/home1/homedir5/perso/aponte/miniconda2/envs/natl60/lib/python2.7/site-packages/xarray/core/variable.py"", line 254, in _data_cached
self._data = np.asarray(self._data)
File ""/home1/homedir5/perso/aponte/miniconda2/envs/natl60/lib/python2.7/site-packages/numpy/core/numeric.py"", line 460, in asarray
return array(a, dtype, copy=False, order=order)
File ""/home1/homedir5/perso/aponte/miniconda2/envs/natl60/lib/python2.7/site-packages/dask/array/core.py"", line 867, in __array__
x = self.compute()
File ""/home1/homedir5/perso/aponte/miniconda2/envs/natl60/lib/python2.7/site-packages/dask/base.py"", line 37, in compute
return compute(self, **kwargs)[0]
File ""/home1/homedir5/perso/aponte/miniconda2/envs/natl60/lib/python2.7/site-packages/dask/base.py"", line 110, in compute
results = get(dsk, keys, **kwargs)
File ""/home1/homedir5/perso/aponte/miniconda2/envs/natl60/lib/python2.7/site-packages/dask/threaded.py"", line 57, in get
**kwargs)
File ""/home1/homedir5/perso/aponte/miniconda2/envs/natl60/lib/python2.7/site-packages/dask/async.py"", line 481, in get_async
raise(remote_exception(res, tb))
dask.async.MemoryError:
Traceback
---------
File ""/home1/homedir5/perso/aponte/miniconda2/envs/natl60/lib/python2.7/site-packages/dask/async.py"", line 264, in execute_task
result = _execute_task(task, data)
File ""/home1/homedir5/perso/aponte/miniconda2/envs/natl60/lib/python2.7/site-packages/dask/async.py"", line 245, in _execute_task
args2 = [_execute_task(a, cache) for a in args]
File ""/home1/homedir5/perso/aponte/miniconda2/envs/natl60/lib/python2.7/site-packages/dask/async.py"", line 245, in _execute_task
args2 = [_execute_task(a, cache) for a in args]
File ""/home1/homedir5/perso/aponte/miniconda2/envs/natl60/lib/python2.7/site-packages/dask/async.py"", line 242, in _execute_task
return [_execute_task(a, cache) for a in arg]
File ""/home1/homedir5/perso/aponte/miniconda2/envs/natl60/lib/python2.7/site-packages/dask/async.py"", line 242, in _execute_task
return [_execute_task(a, cache) for a in arg]
File ""/home1/homedir5/perso/aponte/miniconda2/envs/natl60/lib/python2.7/site-packages/dask/async.py"", line 242, in _execute_task
return [_execute_task(a, cache) for a in arg]
File ""/home1/homedir5/perso/aponte/miniconda2/envs/natl60/lib/python2.7/site-packages/dask/async.py"", line 242, in _execute_task
return [_execute_task(a, cache) for a in arg]
File ""/home1/homedir5/perso/aponte/miniconda2/envs/natl60/lib/python2.7/site-packages/dask/async.py"", line 245, in _execute_task
args2 = [_execute_task(a, cache) for a in args]
File ""/home1/homedir5/perso/aponte/miniconda2/envs/natl60/lib/python2.7/site-packages/dask/async.py"", line 246, in _execute_task
return func(*args2)
File ""/home1/homedir5/perso/aponte/miniconda2/envs/natl60/lib/python2.7/site-packages/dask/array/core.py"", line 50, in getarray
c = np.asarray(c)
File ""/home1/homedir5/perso/aponte/miniconda2/envs/natl60/lib/python2.7/site-packages/numpy/core/numeric.py"", line 460, in asarray
return array(a, dtype, copy=False, order=order)
File ""/home1/homedir5/perso/aponte/miniconda2/envs/natl60/lib/python2.7/site-packages/xarray/core/indexing.py"", line 312, in __array__
return np.asarray(array[self.key], dtype=None)
File ""/home1/homedir5/perso/aponte/miniconda2/envs/natl60/lib/python2.7/site-packages/xarray/conventions.py"", line 359, in __getitem__
self.scale_factor, self.add_offset, self._dtype)
File ""/home1/homedir5/perso/aponte/miniconda2/envs/natl60/lib/python2.7/site-packages/xarray/conventions.py"", line 57, in mask_and_scale
values = np.array(array, dtype=dtype, copy=True)
```
Cheers
Aurelien
","{""url"": ""https://api.github.com/repos/pydata/xarray/issues/896/reactions"", ""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,completed,13221727,issue
373449569,MDU6SXNzdWUzNzM0NDk1Njk=,2504,isel slows down computation significantly after open_dataset,11750960,closed,0,,,3,2018-10-24T12:09:18Z,2018-10-25T19:12:06Z,2018-10-25T19:12:06Z,CONTRIBUTOR,,,,"isel significantly slows down a simple mean calculation:
```python
ds = xr.open_dataset(grid_dir_nc+'Depth.nc', chunks={'face':1})
print(ds)
% time print(ds.Depth.mean().values)
```
leads to:
```
Dimensions: (face: 13, i: 4320, j: 4320)
Coordinates:
* i (i) int64 0 1 2 3 4 5 6 7 ... 4313 4314 4315 4316 4317 4318 4319
* j (j) int64 0 1 2 3 4 5 6 7 ... 4313 4314 4315 4316 4317 4318 4319
* face (face) int64 0 1 2 3 4 5 6 7 8 9 10 11 12
Data variables:
Depth (face, j, i) float32 dask.array
1935.0237
CPU times: user 241 ms, sys: 16.9 ms, total: 258 ms
Wall time: 1.05 s
```
```
ds = xr.open_dataset(grid_dir_nc+'Depth.nc', chunks={'face':1})
ds = ds.isel(i=slice(None,None,4),j=slice(None,None,4))
% time print(ds.Depth.mean().values)
```
leads to:
```
1935.0199
CPU times: user 9.43 s, sys: 819 ms, total: 10.3 s
Wall time: 2min 57s
```
Is this expected behavior?
#### Output of ``xr.show_versions()``
I am using latest xarray version (`pip install https://github.com/pydata/xarray/archive/master.zip`)
INSTALLED VERSIONS
------------------
commit: None
python: 3.6.6.final.0
python-bits: 64
OS: Linux
OS-release: 3.10.0-862.2.3.el7.x86_64
machine: x86_64
processor: x86_64
byteorder: little
LC_ALL: None
LANG: en_US.UTF-8
LOCALE: en_US.UTF-8
xarray: 0+unknown
pandas: 0.23.4
numpy: 1.15.3
scipy: 1.1.0
netCDF4: 1.4.1
h5netcdf: None
h5py: None
Nio: None
zarr: 2.2.0
cftime: 1.0.1
PseudonetCDF: None
rasterio: None
iris: None
bottleneck: None
cyordereddict: None
dask: 0.19.2
distributed: 1.23.2
matplotlib: 2.2.2
cartopy: 0.16.0
seaborn: None
setuptools: 40.2.0
pip: 10.0.1
conda: None
pytest: None
IPython: 6.5.0
sphinx: None
","{""url"": ""https://api.github.com/repos/pydata/xarray/issues/2504/reactions"", ""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,completed,13221727,issue
323333361,MDU6SXNzdWUzMjMzMzMzNjE=,2132,to_netcdf - RuntimeError: NetCDF: HDF error,11750960,closed,0,,,3,2018-05-15T18:31:49Z,2018-05-16T19:50:52Z,2018-05-16T18:52:59Z,CONTRIBUTOR,,,,"I am trying to store data to a netcdf file, and have issues:
Data is created according to:
```python
import numpy as np
import xarray as xr
i = np.arange(4320)
j = np.arange(4320)
face = np.arange(13)
v = xr.DataArray(np.random.randn(face.size, j.size, i.size), \
coords={'i': i, 'j': j, 'face': face}, dims=['face','j','i'])
```
The following works:
```
file_out = 'rand.nc'
v.to_netcdf(file_out)
```
there is a minor warning:
```
/home1/datahome/aponte/.miniconda3/envs/equinox/lib/python3.6/site-packages/distributed/utils.py:128: RuntimeWarning: Couldn't detect a suitable IP address for reaching '8.8.8.8', defaulting to '127.0.0.1': [Errno 101] Network is unreachable
% (host, default, e), RuntimeWarning)
```
But this does not work:
```
file_out = '/home1/datawork/aponte/mit_tmp/rand.nc'
v.to_netcdf(file_out)
```
with the following error message:
```
---------------------------------------------------------------------------
RuntimeError Traceback (most recent call last)
~/.miniconda3/envs/equinox/lib/python3.6/site-packages/xarray/backends/api.py in to_netcdf(dataset, path_or_file, mode, format, group, engine, writer, encoding, unlimited_dims)
656 dataset.dump_to_store(store, sync=sync, encoding=encoding,
--> 657 unlimited_dims=unlimited_dims)
658 if path_or_file is None:
~/.miniconda3/envs/equinox/lib/python3.6/site-packages/xarray/core/dataset.py in dump_to_store(self, store, encoder, sync, encoding, unlimited_dims)
1073 store.store(variables, attrs, check_encoding,
-> 1074 unlimited_dims=unlimited_dims)
1075 if sync:
~/.miniconda3/envs/equinox/lib/python3.6/site-packages/xarray/backends/common.py in store(self, variables, attributes, check_encoding_set, unlimited_dims)
362 self.set_variables(variables, check_encoding_set,
--> 363 unlimited_dims=unlimited_dims)
364
~/.miniconda3/envs/equinox/lib/python3.6/site-packages/xarray/backends/netCDF4_.py in set_variables(self, *args, **kwargs)
353 with self.ensure_open(autoclose=False):
--> 354 super(NetCDF4DataStore, self).set_variables(*args, **kwargs)
355
~/.miniconda3/envs/equinox/lib/python3.6/site-packages/xarray/backends/common.py in set_variables(self, variables, check_encoding_set, unlimited_dims)
401
--> 402 self.writer.add(source, target)
403
~/.miniconda3/envs/equinox/lib/python3.6/site-packages/xarray/backends/common.py in add(self, source, target)
264 else:
--> 265 target[...] = source
266
~/.miniconda3/envs/equinox/lib/python3.6/site-packages/xarray/backends/netCDF4_.py in __setitem__(self, key, value)
46 data = self.get_array()
---> 47 data[key] = value
48
netCDF4/_netCDF4.pyx in netCDF4._netCDF4.Variable.__setitem__()
netCDF4/_netCDF4.pyx in netCDF4._netCDF4.Variable._put()
netCDF4/_netCDF4.pyx in netCDF4._netCDF4._ensure_nc_success()
RuntimeError: NetCDF: HDF error
During handling of the above exception, another exception occurred:
RuntimeError Traceback (most recent call last)
in ()
2 if os.path.isfile(file_out):
3 os.remove(file_out)
----> 4 v.to_netcdf(file_out)
~/.miniconda3/envs/equinox/lib/python3.6/site-packages/xarray/core/dataarray.py in to_netcdf(self, *args, **kwargs)
1515 dataset = self.to_dataset()
1516
-> 1517 return dataset.to_netcdf(*args, **kwargs)
1518
1519 def to_dict(self):
~/.miniconda3/envs/equinox/lib/python3.6/site-packages/xarray/core/dataset.py in to_netcdf(self, path, mode, format, group, engine, encoding, unlimited_dims)
1135 return to_netcdf(self, path, mode, format=format, group=group,
1136 engine=engine, encoding=encoding,
-> 1137 unlimited_dims=unlimited_dims)
1138
1139 def to_zarr(self, store=None, mode='w-', synchronizer=None, group=None,
~/.miniconda3/envs/equinox/lib/python3.6/site-packages/xarray/backends/api.py in to_netcdf(dataset, path_or_file, mode, format, group, engine, writer, encoding, unlimited_dims)
660 finally:
661 if sync and isinstance(path_or_file, basestring):
--> 662 store.close()
663
664 if not sync:
~/.miniconda3/envs/equinox/lib/python3.6/site-packages/xarray/backends/netCDF4_.py in close(self)
419 ds = find_root(self.ds)
420 if ds._isopen:
--> 421 ds.close()
422 self._isopen = False
netCDF4/_netCDF4.pyx in netCDF4._netCDF4.Dataset.close()
netCDF4/_netCDF4.pyx in netCDF4._netCDF4.Dataset._close()
netCDF4/_netCDF4.pyx in netCDF4._netCDF4._ensure_nc_success()
RuntimeError: NetCDF: HDF error
```
The following may of some use:
```
(equinox) aponte@datarmor1:~/mit_equinox/sandbox> stat -f -L -c %T /home1/datawork/aponte/mit_tmp/
gpfs
(equinox) aponte@datarmor1:~/mit_equinox/sandbox> stat -f -L -c %T .
nfs
```
(the `.`directory being where the notebook seats)
#### Output of ``xr.show_versions()``
# Paste the output here xr.show_versions() here
```
/home1/datahome/aponte/.miniconda3/envs/equinox/lib/python3.6/site-packages/h5py/__init__.py:36: FutureWarning: Conversion of the second argument of issubdtype from `float` to `np.floating` is deprecated. In future, it will be treated as `np.float64 == np.dtype(float).type`.
from ._conv import register_converters as _register_converters
INSTALLED VERSIONS
------------------
commit: None
python: 3.6.5.final.0
python-bits: 64
OS: Linux
OS-release: 3.12.53-60.30-default
machine: x86_64
processor: x86_64
byteorder: little
LC_ALL: None
LANG: en_US.UTF-8
LOCALE: en_US.UTF-8
xarray: 0.10.3
pandas: 0.22.0
numpy: 1.14.2
scipy: 1.0.1
netCDF4: 1.3.1
h5netcdf: 0.5.1
h5py: 2.7.1
Nio: None
zarr: None
bottleneck: 1.2.1
cyordereddict: None
dask: 0.17.2
distributed: 1.21.6
matplotlib: 2.2.2
cartopy: 0.16.0
seaborn: None
setuptools: 39.0.1
pip: 9.0.3
conda: None
pytest: None
IPython: 6.3.1
sphinx: None
```
","{""url"": ""https://api.github.com/repos/pydata/xarray/issues/2132/reactions"", ""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,completed,13221727,issue