home / github / issues

Menu
  • GraphQL API
  • Search all tables

issues: 651945063

This data as json

id node_id number title user state locked assignee milestone comments created_at updated_at closed_at author_association active_lock_reason draft pull_request body reactions performed_via_github_app state_reason repo type
651945063 MDU6SXNzdWU2NTE5NDUwNjM= 4205 Chunking causes unrelated non-dimension coordinate to become a dask array 1053153 open 0     2 2020-07-07T02:35:15Z 2022-07-12T15:25:05Z   CONTRIBUTOR      

What happened:

Rechunking along an independent dimension causes unrelated non-dimension coordinates to become dask arrays. The dimension coordinates do not seem affected.

I can stick in a synchronous compute on the coordinate to recover, but wanted to be sure this was the expected behavior.

What you expected to happen:

Chunking along an unrelated dimension should not affect unrelated non-dimension coordinates.

Minimal Complete Verifiable Example:

```python import xarray as xr import dask.array as da

def print_coords(a, title): print() print(title) for dim in ['x', 'y', 'b']: if dim in a.dims or dim in a.coords: print('dim:', dim, 'type:', type(a.coords[dim].data))

arr = xr.DataArray(da.zeros((20, 20), chunks=10), dims=('x', 'y'), coords={'b': ('y', range(100,120)), 'x': range(20), 'y': range(20)})

print_coords(arr, 'Original')

The following line rechunks independently of b or y.

Removing this line allows the code to succeed.

arr = arr.chunk({'x': 5})

print_coords(arr, 'After chunking')

arr = arr.sel(y=2)

print_coords(arr, 'After selection')

print() print('Scalar values:') print('y=', arr.coords['y'].item()) print('b=', arr.coords['b'].item()) # Sad Panda Original dim: x type: <class 'numpy.ndarray'> dim: y type: <class 'numpy.ndarray'> dim: b type: <class 'numpy.ndarray'>

After chunking dim: x type: <class 'numpy.ndarray'> dim: y type: <class 'numpy.ndarray'> dim: b type: <class 'dask.array.core.Array'>

After selection dim: x type: <class 'numpy.ndarray'> dim: y type: <class 'numpy.ndarray'> dim: b type: <class 'dask.array.core.Array'>

Scalar values: y= 2

<stack trace elided> NotImplementedError: 'item' is not yet a valid method on dask arrays ```

Environment:

Output of <tt>xr.show_versions()</tt> INSTALLED VERSIONS ------------------ commit: None python: 3.7.6 | packaged by conda-forge | (default, Jun 1 2020, 18:57:50) [GCC 7.5.0] python-bits: 64 OS: Linux OS-release: 4.19.112+ machine: x86_64 processor: x86_64 byteorder: little LC_ALL: en_US.UTF-8 LANG: en_US.UTF-8 LOCALE: en_US.UTF-8 libhdf5: 1.10.4 libnetcdf: None xarray: 0.15.1 pandas: 1.0.5 numpy: 1.18.5 scipy: 1.4.1 netCDF4: None pydap: None h5netcdf: None h5py: 2.10.0 Nio: None zarr: 2.4.0 cftime: None nc_time_axis: None PseudoNetCDF: None rasterio: None cfgrib: None iris: None bottleneck: None dask: 2.19.0 distributed: 2.19.0 matplotlib: 3.2.2 cartopy: None seaborn: None numbagg: None setuptools: 49.1.0.post20200704 pip: 20.1.1 conda: 4.8.3 pytest: 5.4.3 IPython: 7.16.1 sphinx: None
{
    "url": "https://api.github.com/repos/pydata/xarray/issues/4205/reactions",
    "total_count": 1,
    "+1": 1,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
    13221727 issue

Links from other tables

  • 1 row from issues_id in issues_labels
  • 2 rows from issue in issue_comments
Powered by Datasette · Queries took 0.676ms · About: xarray-datasette