home / github / issues

Menu
  • GraphQL API
  • Search all tables

issues: 440988633

This data as json

id node_id number title user state locked assignee milestone comments created_at updated_at closed_at author_association active_lock_reason draft pull_request body reactions performed_via_github_app state_reason repo type
440988633 MDU6SXNzdWU0NDA5ODg2MzM= 2943 Rolling operations loose chunking with dask and bottleneck 161133 closed 0     1 2019-05-07T01:52:05Z 2019-05-07T02:01:13Z 2019-05-07T02:01:13Z CONTRIBUTOR      

Code Sample, a copy-pastable example if possible

A "Minimal, Complete and Verifiable Example" will make it much easier for maintainers to help you: http://matthewrocklin.com/blog/work/2018/02/28/minimal-bug-reports

```python import bottleneck import xarray import dask

data = dask.array.ones((100,), chunks=(10,)) da = xarray.DataArray(data, dims=['time'])

rolled = da.rolling(time=15).mean()

Expect the 'rolled' dataset to be chunked approximately the same as 'data',

however there is only one chunk in 'rolled' instead of 10

assert len(rolled.chunks[0]) > 1 ```

Problem description

Rolling operations loose chunking over the rolled dimension when using dask datasets with bottleneck installed, which is a problem for large datasets where we don't want to load the entire thing.

The issue appears to be caused by xarray.core.dask_array_ops.dask_rolling_wrapper calling dask.array.overlap.overlap on a DataArray instead of a Dask array. Possibly #2940 is related?

Expected Output

Chunks should be preserved through .rolling().mean()

Output of xr.show_versions()

INSTALLED VERSIONS ------------------ commit: None python: 3.6.7 | packaged by conda-forge | (default, Feb 28 2019, 09:07:38) [GCC 7.3.0] python-bits: 64 OS: Linux OS-release: 3.10.0-862.14.4.el6.x86_64 machine: x86_64 processor: x86_64 byteorder: little LC_ALL: en_AU.utf8 LANG: C LOCALE: en_AU.UTF-8 libhdf5: 1.10.4 libnetcdf: 4.6.2 xarray: 0.12.1 pandas: 0.24.2 numpy: 1.16.3 scipy: 1.2.1 netCDF4: 1.5.0.1 pydap: installed h5netcdf: None h5py: 2.9.0 Nio: None zarr: 2.3.1 cftime: 1.0.3.4 nc_time_axis: 1.2.0 PseudonetCDF: None rasterio: None cfgrib: None iris: 2.2.0 bottleneck: 1.2.1 dask: 1.2.0 distributed: 1.27.1 matplotlib: 3.0.3 cartopy: 0.17.0 seaborn: 0.9.0 setuptools: 41.0.1 pip: 19.1 conda: None pytest: 4.4.1 IPython: 7.5.0 sphinx: None
{
    "url": "https://api.github.com/repos/pydata/xarray/issues/2943/reactions",
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  completed 13221727 issue

Links from other tables

  • 0 rows from issues_id in issues_labels
  • 1 row from issue in issue_comments
Powered by Datasette · Queries took 157.277ms · About: xarray-datasette