home / github / issue_comments

Menu
  • Search all tables
  • GraphQL API

issue_comments: 1176645772

This data as json

html_url issue_url id node_id user created_at updated_at author_association body reactions performed_via_github_app issue
https://github.com/pydata/xarray/issues/4205#issuecomment-1176645772 https://api.github.com/repos/pydata/xarray/issues/4205 1176645772 IC_kwDOAMm_X85GIjCM 20629530 2022-07-06T20:00:09Z 2022-07-06T20:00:32Z CONTRIBUTOR

I have the same problem in xarray 2022.3.0. The issue is that this creates unnecessary dask tasks in the graph and some operations acting on the coordinates unexpectedly trigger dask computations. "Unexpected" because the coordinates at the beginning of the process where not chunked. So computation that was expected to happen in the main thread (or not happen at all) is now happenning in the dask workers.

An example: ```python3 import numpy as np import xarray as xr from dask.diagnostics import ProgressBar

A 2D variable

da = xr.DataArray( np.ones((12, 10)), dims=('x', 'y'), coords={'x': np.arange(12), 'y': np.arange(10)} )

A 1D variable sharing a dim with da

db = xr.DataArray( np.ones((12,)), dims=('x'), coords={'x': np.arange(12)} )

A non-dimension coordinate

cx = xr.DataArray(np.zeros((12,)), dims=('x',), coords={'x': np.arange(12)})

Assign it to da and db

da = da.assign_coords(cx=cx) db = db.assign_coords(cx=cx)

We need to chunk along y

da = da.chunk({'y': 1})

Notice how cx is now a dask array, even if it is a 1D coordinate and does not have 'Y' as a dimension.

print(da)

This triggers a dask computation

with ProgressBar(): da - db ``` The reason my example triggers dask is that xarray ensure the coordinates are aligned and equal (I think?). Anyway, I didn't expect it.

Personally, I think the chunk method shouldn't apply to the coordinates at all, no matter their dimensions. They're coordinate so we expect to be able to read them easily when aligning/comparing dataset. Dask is to be used with the "real" data only. Does this vision fit the one from the devs? I feel this "skip" could be easily implemented.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  651945063
Powered by Datasette · Queries took 0.877ms · About: xarray-datasette