id,node_id,number,title,user,state,locked,assignee,milestone,comments,created_at,updated_at,closed_at,author_association,active_lock_reason,draft,pull_request,body,reactions,performed_via_github_app,state_reason,repo,type
327613219,MDU6SXNzdWUzMjc2MTMyMTk=,2198,DataArray.encoding['chunksizes'] not respected in to_netcdf,6404167,closed,0,,,2,2018-05-30T07:50:59Z,2019-06-06T20:35:50Z,2019-06-06T20:35:50Z,CONTRIBUTOR,,,,"This might be just a documentation issue, so sorry if this is not a problem with xarray.
I'm trying to save an intermediate result of a calculation with xarray + dask to disk, but I'd like to preserve the on-disk chunking. Setting the encoding of a Dataset.data_var or DataArray using the encoding attribute seems to work for (at least) some encoding variables, but not for `chunksizes`. For example:
``` python
import xarray as xr
import dask.array as da
from dask.distributed import Client
from IPython import embed
# First generate a file with random numbers
rng = da.random.RandomState()
shape = (10, 10000)
chunks = [10, 10]
dims = ['x', 'y']
z = rng.standard_normal(shape, chunks=chunks)
da = xr.DataArray(z, dims=dims, name='z')
# Set encoding of the DataArray
da.encoding['chunksizes'] = chunks # Not conserved
da.encoding['zlib'] = True # Conserved
ds = da.to_dataset()
print(ds['z'].encoding) #out: {'chunksizes': [10, 10], 'zlib': True}
# This one is chunked and compressed correctly
ds.to_netcdf('test1.nc', encoding={'z': {'chunksizes': chunks}})
# While this one is only compressed
ds.to_netcdf('test2.nc')
```
INSTALLED VERSIONS
------------------
commit: None
python: 3.6.5.final.0
python-bits: 64
OS: Linux
OS-release: 4.16.5-1-ARCH
machine: x86_64
processor:
byteorder: little
LC_ALL:
LANG: en_US.UTF-8
LOCALE: en_US.UTF-8
xarray: 0.10.4
pandas: 0.22.0
numpy: 1.14.3
scipy: 0.19.0
netCDF4: 1.4.0
h5netcdf: 0.5.1
h5py: 2.7.1
Nio: None
zarr: None
bottleneck: None
cyordereddict: None
dask: 0.17.5
distributed: 1.21.8
matplotlib: 2.0.2
cartopy: None
seaborn: 0.7.1
setuptools: 39.1.0
pip: 9.0.1
conda: None
pytest: 3.2.2
IPython: 6.3.1
sphinx: None
","{""url"": ""https://api.github.com/repos/pydata/xarray/issues/2198/reactions"", ""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,completed,13221727,issue
328439361,MDExOlB1bGxSZXF1ZXN0MTkxOTc1NTkz,2207,"Fixes #2198: Drop chunksizes when only when original_shape is different, not when it isn't found",6404167,closed,0,,,4,2018-06-01T09:08:11Z,2019-06-06T20:35:50Z,2019-06-06T20:35:50Z,CONTRIBUTOR,,0,pydata/xarray/pulls/2207,"Before this fix chunksizes was dropped even when
original_shape was not found in encoding
- [x] Closes #2198
- [x] Tests added (for all bug fixes or enhancements)
- [x] Tests passed (for all non-documentation changes)
Four seemingly unrelated tests failed
``` python
___________________________________________________________________________________ TestEncodeCFVariable.test_missing_fillvalue ____________________________________________________________________________________
self =
def test_missing_fillvalue(self):
v = Variable(['x'], np.array([np.nan, 1, 2, 3]))
v.encoding = {'dtype': 'int16'}
with pytest.warns(Warning, match='floating point data as an integer'):
> conventions.encode_cf_variable(v)
E Failed: DID NOT WARN. No warnings of type (,) was emitted. The list of emitted warnings is: [SerializationWarning('saving variable None with floating point data as an integer dtype without any _FillValue to use for NaNs',)].
xarray/tests/test_conventions.py:89: Failed
----------------------------------------------------------------------------------------------- Captured stderr call -----------------------------------------------------------------------------------------------
/usr/lib/python3.6/site-packages/_pytest/vendored_packages/pluggy.py:248: SerializationWarning: saving variable None with floating point data as an integer dtype without any _FillValue to use for NaNs
call_outcome = _CallOutcome(func)
____________________________________________________________________________________________ TestAccessor.test_register ____________________________________________________________________________________________
self =
def test_register(self):
@xr.register_dataset_accessor('demo')
@xr.register_dataarray_accessor('demo')
class DemoAccessor(object):
""""""Demo accessor.""""""
def __init__(self, xarray_obj):
self._obj = xarray_obj
@property
def foo(self):
return 'bar'
ds = xr.Dataset()
assert ds.demo.foo == 'bar'
da = xr.DataArray(0)
assert da.demo.foo == 'bar'
# accessor is cached
assert ds.demo is ds.demo
# check descriptor
assert ds.demo.__doc__ == ""Demo accessor.""
assert xr.Dataset.demo.__doc__ == ""Demo accessor.""
assert isinstance(ds.demo, DemoAccessor)
assert xr.Dataset.demo is DemoAccessor
# ensure we can remove it
del xr.Dataset.demo
assert not hasattr(xr.Dataset, 'demo')
with pytest.warns(Warning, match='overriding a preexisting attribute'):
@xr.register_dataarray_accessor('demo')
> class Foo(object):
E Failed: DID NOT WARN. No warnings of type (,) was emitted. The list of emitted warnings is: [AccessorRegistrationWarning(""registration of accessor .Foo'> under name 'demo' for type is overriding a preexisting attribute with the same name."",)].
xarray/tests/test_extensions.py:60: Failed
----------------------------------------------------------------------------------------------- Captured stderr call -----------------------------------------------------------------------------------------------
/home/karel/working/xarray/xarray/tests/test_extensions.py:60: AccessorRegistrationWarning: registration of accessor .Foo'> under name 'demo' for type is overriding a preexisting attribute with the same name.
class Foo(object):
__________________________________________________________________________________________________ TestAlias.test __________________________________________________________________________________________________
self =
def test(self):
def new_method():
pass
old_method = utils.alias(new_method, 'old_method')
assert 'deprecated' in old_method.__doc__
with pytest.warns(Warning, match='deprecated'):
> old_method()
E Failed: DID NOT WARN. No warnings of type (,) was emitted. The list of emitted warnings is: [FutureWarning('old_method has been deprecated. Use new_method instead.',)].
xarray/tests/test_utils.py:28: Failed
----------------------------------------------------------------------------------------------- Captured stderr call -----------------------------------------------------------------------------------------------
/home/karel/working/xarray/xarray/tests/test_utils.py:28: FutureWarning: old_method has been deprecated. Use new_method instead.
old_method()
_____________________________________________________________________________________ TestIndexVariable.test_coordinate_alias ______________________________________________________________________________________
self =
def test_coordinate_alias(self):
with pytest.warns(Warning, match='deprecated'):
> x = Coordinate('x', [1, 2, 3])
E Failed: DID NOT WARN. No warnings of type (,) was emitted. The list of emitted warnings is: [FutureWarning('Coordinate has been deprecated. Use IndexVariable instead.',)].
xarray/tests/test_variable.py:1763: Failed
----------------------------------------------------------------------------------------------- Captured stderr call -----------------------------------------------------------------------------------------------
/home/karel/working/xarray/xarray/tests/test_variable.py:1763: FutureWarning: Coordinate has been deprecated. Use IndexVariable instead.
x = Coordinate('x', [1, 2, 3])
```
","{""url"": ""https://api.github.com/repos/pydata/xarray/issues/2207/reactions"", ""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,,13221727,pull
327064908,MDU6SXNzdWUzMjcwNjQ5MDg=,2190,Parallel non-locked read using dask.Client crashes,6404167,closed,0,,,5,2018-05-28T15:42:40Z,2019-01-14T21:09:04Z,2019-01-14T21:09:03Z,CONTRIBUTOR,,,,"I'm trying to parallelize my code using Dask. Using their `distributed.Client()` I was able to do computations in parallel. Unfortunately, it seems ~60% of the time is spend in a file lock. As I'm only reading data and doing computations in memory, I should be able to work without a lock, so I tried to pass `lock=False` to `open_dataset`. Unfortunately this crashes my code. A minimal reproducible example can be found below:
``` python
import xarray as xr
import dask.array as da
from dask.distributed import Client
from IPython import embed
# First generate a file with random numbers
rng = da.random.RandomState()
shape = (10, 10000)
chunks = (10, 10)
dims = ['y', 'z']
x = rng.standard_normal(shape, chunks=chunks)
da = xr.DataArray(x, dims=dims, name='x')
da.to_netcdf('test.nc')
# Open file without a lock
client = Client(processes=False)
ds = xr.open_dataset('test.nc', chunks=dict(zip(dims, chunks)), lock=False)
# This will crash!
print((ds['x'] * ds['x']).compute())
```
Crashes with (sometimes)
``` python
distributed.worker - WARNING - Compute Failed
Function: getter
args: (ImplicitToExplicitIndexingAdapter(array=CopyOnWriteArray(array=LazilyOuterIndexedArray(array=, key=BasicIndexer((slice(None, None, None), slice(None, None, None)))))), (slice(0, 10, None), slice(5710, 5720, None)))
kwargs: {}
Exception: RuntimeError('NetCDF: HDF error',)
```
And usually just with `terminated by signal SIGSEGV (Address boundary error)`
#### Output of ``xr.show_versions()``
``` python
INSTALLED VERSIONS
------------------
commit: None
python: 3.6.5.final.0
python-bits: 64
OS: Linux
OS-release: 4.16.9-1-ARCH
machine: x86_64
processor:
byteorder: little
LC_ALL:
LANG: en_US.UTF-8
LOCALE: en_US.UTF-8
xarray: 0.10.2
pandas: 0.20.3
numpy: 1.14.0
scipy: 0.19.1
netCDF4: 1.4.0
h5netcdf: None
h5py: 2.7.1
Nio: None
zarr: None
bottleneck: None
cyordereddict: None
dask: 0.17.5
distributed: 1.21.8
matplotlib: 2.1.2
cartopy: None
seaborn: 0.8.1
setuptools: 38.5.1
pip: 10.0.1
conda: None
pytest: 3.4.0
IPython: 6.3.1
sphinx: 1.6.4
```
A ""Minimal, Complete and Verifiable Example"" will make it much easier for maintainers to help you:
http://matthewrocklin.com/blog/work/2018/02/28/minimal-bug-reports
```python
# Your code here
```
#### Problem description
[this should explain **why** the current behavior is a problem and why the expected output is a better solution.]
#### Expected Output
#### Output of ``xr.show_versions()``
# Paste the output here xr.show_versions() here
","{""url"": ""https://api.github.com/repos/pydata/xarray/issues/2190/reactions"", ""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,completed,13221727,issue