home / github / issues

Menu
  • Search all tables
  • GraphQL API

issues: 281983819

This data as json

id node_id number title user state locked assignee milestone comments created_at updated_at closed_at author_association active_lock_reason draft pull_request body reactions performed_via_github_app state_reason repo type
281983819 MDU6SXNzdWUyODE5ODM4MTk= 1779 decode_cf destroys chunks 1197350 closed 0     2 2017-12-14T05:12:00Z 2017-12-15T14:50:42Z 2017-12-15T14:50:41Z MEMBER      

Code Sample, a copy-pastable example if possible

python import numpy as np import xarray as xr xr.DataArray(np.random.rand(1000)).to_dataset(name='random').chunk(100) ds_cf = xr.decode_cf(ds) assert not ds_cf.chunks

Problem description

Calling decode_cf causes variables whose data is dask arrays to be wrapped in two layers of abstractions: DaskIndexingAdapter and LazilyIndexedArray. In the example above ```python

ds.random.variable._data dask.array<da.random.random_sample, shape=(1000,), dtype=float64, chunksize=(100,)> ds_cf.random.variable._data LazilyIndexedArray(array=DaskIndexingAdapter(array=dask.array<da.random.random_sample, shape=(1000,), dtype=float64, chunksize=(100,)>), key=BasicIndexer((slice(None, None, None),))) ``` At least part of the problem comes from this line: https://github.com/pydata/xarray/blob/master/xarray/conventions.py#L1045

This is especially problematic if we want to concatenate several such datasets together with dask. Chunking the decoded dataset creates a nested dask-within-dask array which is sure to cause undesirable behavior down the line

```python

dict(ds_cf.chunk().random.data.dask) {('xarray-random-bf5298b8790e93c1564b5dca9e04399e', 0): (<function dask.array.core.getter>, 'xarray-random-bf5298b8790e93c1564b5dca9e04399e', (slice(0, 1000, None),)), 'xarray-random-bf5298b8790e93c1564b5dca9e04399e': ImplicitToExplicitIndexingAdapter(array=LazilyIndexedArray(array=DaskIndexingAdapter(array=dask.array<da.random.random_sample, shape=(1000,), dtype=float64, chunksize=(100,)>), key=BasicIndexer((slice(None, None, None),))))} ```

Expected Output

If we call decode_cf on a dataset made of dask arrays, it should preserve the chunks of the original dask arrays. Hopefully this can be addressed by #1752.

Output of xr.show_versions()

commit: 85174cda6440c2f6eed7860357e79897e796e623 python: 3.6.2.final.0 python-bits: 64 OS: Darwin OS-release: 16.7.0 machine: x86_64 processor: i386 byteorder: little LC_ALL: None LANG: en_US.UTF-8 LOCALE: en_US.UTF-8 xarray: 0.10.0-52-gd8842a6 pandas: 0.20.3 numpy: 1.13.1 scipy: 0.19.1 netCDF4: 1.2.9 h5netcdf: 0.4.1 Nio: None bottleneck: 1.2.1 cyordereddict: None dask: 0.16.0 matplotlib: 2.1.0 cartopy: 0.15.1 seaborn: 0.8.1 setuptools: 36.3.0 pip: 9.0.1 conda: None pytest: 3.2.1 IPython: 6.1.0 sphinx: 1.6.5
{
    "url": "https://api.github.com/repos/pydata/xarray/issues/1779/reactions",
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  completed 13221727 issue

Links from other tables

  • 4 rows from issues_id in issues_labels
  • 2 rows from issue in issue_comments
Powered by Datasette · Queries took 0.882ms · About: xarray-datasette