home / github / issues

Menu
  • GraphQL API
  • Search all tables

issues: 488547784

This data as json

id node_id number title user state locked assignee milestone comments created_at updated_at closed_at author_association active_lock_reason draft pull_request body reactions performed_via_github_app state_reason repo type
488547784 MDU6SXNzdWU0ODg1NDc3ODQ= 3277 xarray, chunking and rolling operation adds chunking along new dimension (previously worked) 47371188 closed 0     5 2019-09-03T11:25:23Z 2021-03-26T19:50:49Z 2021-03-26T19:50:49Z NONE      

I was testing the latest version of xarray (0.12.3) from the conda-forge channel and this broke some code I had. Under the defaults installation not using conda-forge (xarray=0.12.1), the following code works correctly with desired output:

Test code

```python import pandas as pd import xarray as xr import numpy as np

s_date = '1990-01-01' e_date = '2019-05-01' days = pd.date_range(start=s_date, end=e_date, freq='B', name='day') items = pd.Index([str(i) for i in range(300)], name = 'item') dat = xr.DataArray(np.random.rand(len(days), len(items)), coords=[days, items]) dat_chunk = dat.chunk({'item': 20}) dat_mean = dat_chunk.rolling(day=10).mean()

print(dat_chunk) print(' ') print(dat_mean)

dat_std_avg = dat_mean.rolling(day=250).std()

print(' ') print(dat_std_avg) ```

Output (correct) with xarray=0.12.1 - note the chunksizes

``` <xarray.DataArray (day: 7653, item: 300)> dask.array<shape=(7653, 300), dtype=float64, chunksize=(7653, 20)> Coordinates: * day (day) datetime64[ns] 1990-01-01 1990-01-02 ... 2019-05-01 * item (item) object '0' '1' '2' '3' '4' ... '295' '296' '297' '298' '299'

<xarray.DataArray '_trim-8c9287bf114d61cb3ad74780465cd19f' (day: 7653, item: 300)> dask.array<shape=(7653, 300), dtype=float64, chunksize=(7653, 20)> Coordinates: * day (day) datetime64[ns] 1990-01-01 1990-01-02 ... 2019-05-01 * item (item) object '0' '1' '2' '3' '4' ... '295' '296' '297' '298' '299'

<xarray.DataArray '_trim-2ee90b6c2f29f71a7798a204a4ad3305' (day: 7653, item: 300)> dask.array<shape=(7653, 300), dtype=float64, chunksize=(7653, 20)> Coordinates: * day (day) datetime64[ns] 1990-01-01 1990-01-02 ... 2019-05-01 * item (item) object '0' '1' '2' '3' '4' ... '295' '296' '297' '298' '299' ```

Output (now failing) with xarray=0.12.3 (note the chunksizes)

``` <xarray.DataArray (day: 7653, item: 300)> dask.array<shape=(7653, 300), dtype=float64, chunksize=(7653, 20)> Coordinates: * day (day) datetime64[ns] 1990-01-01 1990-01-02 ... 2019-05-01 * item (item) object '0' '1' '2' '3' '4' ... '295' '296' '297' '298' '299'

<xarray.DataArray (day: 7653, item: 300)> dask.array<shape=(7653, 300), dtype=float64, chunksize=(5, 20)> Coordinates: * day (day) datetime64[ns] 1990-01-01 1990-01-02 ... 2019-05-01 * item (item) object '0' '1' '2' '3' '4' ... '295' '296' '297' '298' '299'


ValueError Traceback (most recent call last) ...

ValueError: For window size 250, every chunk should be larger than 125, but the smallest chunk size is 5. Rechunk your array with a larger chunk size or a chunk size that more evenly divides the shape of your array. ```

Problem Description

Using dask + rolling + xarray=0.12.3 appears to add undesirable chunking in a new dimension which was not the case previously using xarray=0.12.1 This additional chunking made the the queuing of a further rolling operation fail with a ValueError. This (at the very least) makes queuing dask based delayed operations difficult when multiple rolling operations are used.

Output of xr.show_versions() for the not working version

INSTALLED VERSIONS ------------------ commit: None python: 3.7.3 | packaged by conda-forge | (default, Jul 1 2019, 22:01:29) [MSC v.1900 64 bit (AMD64)] python-bits: 64 OS: Windows OS-release: 10 machine: AMD64 processor: AMD64 Family 23 Model 113 Stepping 0, AuthenticAMD byteorder: little LC_ALL: None LANG: None LOCALE: None.None libhdf5: 1.10.4 libnetcdf: 4.6.1 xarray: 0.12.3 pandas: 0.25.1 numpy: 1.16.4 scipy: 1.3.1 netCDF4: 1.4.2 pydap: None h5netcdf: None h5py: 2.9.0 Nio: None zarr: None cftime: 1.0.3.4 nc_time_axis: None PseudoNetCDF: None rasterio: None cfgrib: None iris: None bottleneck: 1.2.1 dask: 2.3.0 distributed: 2.3.2 matplotlib: 3.1.1 cartopy: None seaborn: 0.9.0 numbagg: None setuptools: 41.0.1 pip: 19.2.2 conda: 4.7.11 pytest: None IPython: 7.8.0 sphinx: None

Apologies if this issue is reported, I was unable to find a case that appeared equivalent.

{
    "url": "https://api.github.com/repos/pydata/xarray/issues/3277/reactions",
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  completed 13221727 issue

Links from other tables

  • 3 rows from issues_id in issues_labels
  • 5 rows from issue in issue_comments
Powered by Datasette · Queries took 0.667ms · About: xarray-datasette