home / github / issues

Menu
  • GraphQL API
  • Search all tables

issues: 702646191

This data as json

id node_id number title user state locked assignee milestone comments created_at updated_at closed_at author_association active_lock_reason draft pull_request body reactions performed_via_github_app state_reason repo type
702646191 MDU6SXNzdWU3MDI2NDYxOTE= 4428 Behaviour change in xarray.Dataset.sortby/sel between dask==2.25.0 and dask==2.26.0 6582745 closed 0     8 2020-09-16T10:26:38Z 2021-07-04T04:12:34Z 2021-07-04T04:12:34Z NONE      

What happened: A project of mine suddenly broke with: ValueError: Object has inconsistent chunks along dimension row. This can be fixed by calling unify_chunks(). where previously it had worked.

What you expected to happen: There should have been no change.

Minimal Complete Verifiable Example: This is very difficult to reproduce. I have tried, but it clearly isn't triggered for relatively simple xarray.Datasets. In my code, the Datasets in question are the result of multiple concatenations, selection and chunking operations. What I shall do instead is attempt to demonstrate the change, in the hopes that someone more knowledgeable has some intuition for what has gone wrong.

dask==2.25.0

I have a dataset, foo, with a number of different variables, most indexed by row. I will focus on one variable to demonstrate the change in behaviour, specifically FLAG. This is what flag looks like prior to a foo.sortby("row") call. Note that there is only a single chunk (this is intentional).

<xarray.DataArray 'FLAG' (row: 40710, chan: 1024, corr: 4)> dask.array<rechunk-merge, shape=(40710, 1024, 4), dtype=bool, chunksize=(40710, 1024, 4), chunktype=numpy.ndarray> Coordinates: * row (row) int64 462991 462993 462994 462996 ... 505074 505075 505076 Dimensions without coordinates: chan, corr After the foo.sortby("row") call:

<xarray.DataArray 'FLAG' (row: 40710, chan: 1024, corr: 4)> dask.array<getitem, shape=(40710, 1024, 4), dtype=bool, chunksize=(40710, 1024, 4), chunktype=numpy.ndarray> Coordinates: * row (row) int64 462991 462993 462994 462996 ... 505076 505077 505078 Dimensions without coordinates: chan, corr Note that the chunksize is unchanged.

dask==2.26.0

Repeating exactly the same experiment, prior to the call:

<xarray.DataArray 'FLAG' (row: 40710, chan: 1024, corr: 4)> dask.array<rechunk-merge, shape=(40710, 1024, 4), dtype=bool, chunksize=(40710, 1024, 4), chunktype=numpy.ndarray> Coordinates: * row (row) int64 462991 462993 462994 462996 ... 505074 505075 505076 Dimensions without coordinates: chan, corr

After the foo.sortby("row") call: <xarray.DataArray 'FLAG' (row: 40710, chan: 1024, corr: 4)> dask.array<getitem, shape=(40710, 1024, 4), dtype=bool, chunksize=(20355, 1024, 4), chunktype=numpy.ndarray> Coordinates: * row (row) int64 462991 462993 462994 462996 ... 505076 505077 505078 Dimensions without coordinates: chan, corr Note the change in the chunksize.

Anything else we need to know?: I have seen similar behaviour when using xarray.Dataset.sel.

Environment:

dask==2.25.0 ``` INSTALLED VERSIONS


commit: None python: 3.6.9 (default, Jul 17 2020, 12:50:27) [GCC 8.4.0] python-bits: 64 OS: Linux OS-release: 5.3.0-7648-generic machine: x86_64 processor: x86_64 byteorder: little LC_ALL: None LANG: en_US.UTF-8 LOCALE: en_US.UTF-8 libhdf5: None libnetcdf: None

xarray: 0.15.1 pandas: 1.1.2 numpy: 1.19.2 scipy: 1.5.2 netCDF4: None pydap: None h5netcdf: None h5py: None Nio: None zarr: 2.4.0 cftime: None nc_time_axis: None PseudoNetCDF: None rasterio: None cfgrib: None iris: None bottleneck: None dask: 2.25.0 distributed: 2.26.0 matplotlib: None cartopy: None seaborn: None numbagg: None setuptools: 50.3.0 pip: 20.2.3 conda: None pytest: 6.0.2 IPython: None sphinx: None ```

dask==2.26.0 ``` INSTALLED VERSIONS


commit: None python: 3.6.9 (default, Jul 17 2020, 12:50:27) [GCC 8.4.0] python-bits: 64 OS: Linux OS-release: 5.3.0-7648-generic machine: x86_64 processor: x86_64 byteorder: little LC_ALL: None LANG: en_US.UTF-8 LOCALE: en_US.UTF-8 libhdf5: None libnetcdf: None

xarray: 0.15.1 pandas: 1.1.2 numpy: 1.19.2 scipy: 1.5.2 netCDF4: None pydap: None h5netcdf: None h5py: None Nio: None zarr: 2.4.0 cftime: None nc_time_axis: None PseudoNetCDF: None rasterio: None cfgrib: None iris: None bottleneck: None dask: 2.26.0 distributed: 2.26.0 matplotlib: None cartopy: None seaborn: None numbagg: None setuptools: 50.3.0 pip: 20.2.3 conda: None pytest: 6.0.2 IPython: None sphinx: None ```

{
    "url": "https://api.github.com/repos/pydata/xarray/issues/4428/reactions",
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  completed 13221727 issue

Links from other tables

  • 2 rows from issues_id in issues_labels
  • 8 rows from issue in issue_comments
Powered by Datasette · Queries took 0.6ms · About: xarray-datasette