home / github

Menu
  • Search all tables
  • GraphQL API

issue_comments

Table actions
  • GraphQL API for issue_comments

4 rows where author_association = "MEMBER", issue = 333312849 and user = 2448579 sorted by updated_at descending

✎ View and edit SQL

This data as json, CSV (advanced)

user 1

  • dcherian · 4 ✖

issue 1

  • why time grouping doesn't preserve chunks · 4 ✖

author_association 1

  • MEMBER · 4 ✖
id html_url issue_url node_id user created_at updated_at ▲ author_association body reactions performed_via_github_app issue
1126847735 https://github.com/pydata/xarray/issues/2237#issuecomment-1126847735 https://api.github.com/repos/pydata/xarray/issues/2237 IC_kwDOAMm_X85DKlT3 dcherian 2448579 2022-05-15T02:44:06Z 2022-05-15T02:44:06Z MEMBER

Fixed on main with ds.groupby("year").mean(method="blockwise")

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  why time grouping doesn't preserve chunks 333312849
789078512 https://github.com/pydata/xarray/issues/2237#issuecomment-789078512 https://api.github.com/repos/pydata/xarray/issues/2237 MDEyOklzc3VlQ29tbWVudDc4OTA3ODUxMg== dcherian 2448579 2021-03-02T17:29:51Z 2021-03-02T18:03:17Z MEMBER

I think the behaviour in Ryan's most recent comment is a consequence of groupby.mean being python results = [] for group_idx in group_indices: # one group per year group = ds.isel(group_idx) # (SPLIT) results.append(group.mean()) # (APPLY) return xr.concat(results, dim="year") # COMBINE results in one chunk per year (one chunk per element in results)

I think the fundamental question is: Is it really possible for dask to recognize that the chunk structure after the combine step could be consolidated with an arbitrary number of apply steps in the middle ? OR When a computation maps a single chunk to many chunks, should dask consolidate the output chunks (using array.chunk-size)?

We can explicitly ask for consolidation of chunks by saying the output should be chunked 5 along year python dask.config.set({"optimization.fuse.ave-width": 6}) # note > 5 ( ds.foo.groupby("year") .mean(dim="time") .chunk({"year": 5}) # really important, why and how would dask choose this automatically/ .data.visualize(optimize_graph=False) )

Then if we set optimization.fuse.ave-width appropriately, we get the graph we want after optimization python dask.config.set({"optimization.fuse.ave-width": 6}) ( ds.foo.groupby("year") .mean(dim="time") .chunk({"year": 5}) # really important .data.visualize(optimize_graph=True) )

Can we make dask recognize that the 5 getitem tasks from input-chunk-0, at the bottom of each tower, can be fused to a single task? In that case, fuse the 5 getitem tasks and "propagate" that fusion up the tower.

I guess another failure here is that when fuse.ave-width is 3 (< width of tower), why isn't dask fusing to make three "sub-towers" per-tower? Even that would help reduce number of tasks. dask.config.set({"optimization.fuse.ave-width": 3}) ( ds.foo.groupby("year") .mean(dim="time") .chunk({"year": 5}) # really important .data.visualize(optimize_graph=True) )

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  why time grouping doesn't preserve chunks 333312849
789090356 https://github.com/pydata/xarray/issues/2237#issuecomment-789090356 https://api.github.com/repos/pydata/xarray/issues/2237 MDEyOklzc3VlQ29tbWVudDc4OTA5MDM1Ng== dcherian 2448579 2021-03-02T17:48:01Z 2021-03-02T17:48:47Z MEMBER

Reading up on fusion, the docstring says

This optimization applies to all reductions–tasks that have at most one dependent–so it may be viewed as fusing “multiple input, single output” groups of tasks into a single task.

So we need the opposite : fuse "single input, multiple output" to a single task when some appropriate heuristic is satisfied.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  why time grouping doesn't preserve chunks 333312849
482241098 https://github.com/pydata/xarray/issues/2237#issuecomment-482241098 https://api.github.com/repos/pydata/xarray/issues/2237 MDEyOklzc3VlQ29tbWVudDQ4MjI0MTA5OA== dcherian 2448579 2019-04-11T18:22:41Z 2019-04-11T18:22:41Z MEMBER

Can this be closed or is there something to do on the xarray side now that dask/dask#3648 has been merged?

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  why time grouping doesn't preserve chunks 333312849

Advanced export

JSON shape: default, array, newline-delimited, object

CSV options:

CREATE TABLE [issue_comments] (
   [html_url] TEXT,
   [issue_url] TEXT,
   [id] INTEGER PRIMARY KEY,
   [node_id] TEXT,
   [user] INTEGER REFERENCES [users]([id]),
   [created_at] TEXT,
   [updated_at] TEXT,
   [author_association] TEXT,
   [body] TEXT,
   [reactions] TEXT,
   [performed_via_github_app] TEXT,
   [issue] INTEGER REFERENCES [issues]([id])
);
CREATE INDEX [idx_issue_comments_issue]
    ON [issue_comments] ([issue]);
CREATE INDEX [idx_issue_comments_user]
    ON [issue_comments] ([user]);
Powered by Datasette · Queries took 3920.229ms · About: xarray-datasette