home / github

Menu
  • GraphQL API
  • Search all tables

issue_comments

Table actions
  • GraphQL API for issue_comments

2 rows where issue = 651945063 sorted by updated_at descending

✎ View and edit SQL

This data as json, CSV (advanced)

Suggested facets: created_at (date), updated_at (date)

user 2

  • dcherian 1
  • aulemahal 1

author_association 2

  • CONTRIBUTOR 1
  • MEMBER 1

issue 1

  • Chunking causes unrelated non-dimension coordinate to become a dask array · 2 ✖
id html_url issue_url node_id user created_at updated_at ▲ author_association body reactions performed_via_github_app issue
1181905249 https://github.com/pydata/xarray/issues/4205#issuecomment-1181905249 https://api.github.com/repos/pydata/xarray/issues/4205 IC_kwDOAMm_X85GcnFh dcherian 2448579 2022-07-12T15:25:05Z 2022-07-12T15:25:05Z MEMBER

It makes sense to me that chunking along a dimension dim should not chunk variables that don't have that dimension.

@shoyer what do you think

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Chunking causes unrelated non-dimension coordinate to become a dask array 651945063
1176645772 https://github.com/pydata/xarray/issues/4205#issuecomment-1176645772 https://api.github.com/repos/pydata/xarray/issues/4205 IC_kwDOAMm_X85GIjCM aulemahal 20629530 2022-07-06T20:00:09Z 2022-07-06T20:00:32Z CONTRIBUTOR

I have the same problem in xarray 2022.3.0. The issue is that this creates unnecessary dask tasks in the graph and some operations acting on the coordinates unexpectedly trigger dask computations. "Unexpected" because the coordinates at the beginning of the process where not chunked. So computation that was expected to happen in the main thread (or not happen at all) is now happenning in the dask workers.

An example: ```python3 import numpy as np import xarray as xr from dask.diagnostics import ProgressBar

A 2D variable

da = xr.DataArray( np.ones((12, 10)), dims=('x', 'y'), coords={'x': np.arange(12), 'y': np.arange(10)} )

A 1D variable sharing a dim with da

db = xr.DataArray( np.ones((12,)), dims=('x'), coords={'x': np.arange(12)} )

A non-dimension coordinate

cx = xr.DataArray(np.zeros((12,)), dims=('x',), coords={'x': np.arange(12)})

Assign it to da and db

da = da.assign_coords(cx=cx) db = db.assign_coords(cx=cx)

We need to chunk along y

da = da.chunk({'y': 1})

Notice how cx is now a dask array, even if it is a 1D coordinate and does not have 'Y' as a dimension.

print(da)

This triggers a dask computation

with ProgressBar(): da - db ``` The reason my example triggers dask is that xarray ensure the coordinates are aligned and equal (I think?). Anyway, I didn't expect it.

Personally, I think the chunk method shouldn't apply to the coordinates at all, no matter their dimensions. They're coordinate so we expect to be able to read them easily when aligning/comparing dataset. Dask is to be used with the "real" data only. Does this vision fit the one from the devs? I feel this "skip" could be easily implemented.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Chunking causes unrelated non-dimension coordinate to become a dask array 651945063

Advanced export

JSON shape: default, array, newline-delimited, object

CSV options:

CREATE TABLE [issue_comments] (
   [html_url] TEXT,
   [issue_url] TEXT,
   [id] INTEGER PRIMARY KEY,
   [node_id] TEXT,
   [user] INTEGER REFERENCES [users]([id]),
   [created_at] TEXT,
   [updated_at] TEXT,
   [author_association] TEXT,
   [body] TEXT,
   [reactions] TEXT,
   [performed_via_github_app] TEXT,
   [issue] INTEGER REFERENCES [issues]([id])
);
CREATE INDEX [idx_issue_comments_issue]
    ON [issue_comments] ([issue]);
CREATE INDEX [idx_issue_comments_user]
    ON [issue_comments] ([user]);
Powered by Datasette · Queries took 1679.768ms · About: xarray-datasette