home / github

Menu
  • Search all tables
  • GraphQL API

issue_comments

Table actions
  • GraphQL API for issue_comments

8 rows where issue = 288785270 and user = 1197350 sorted by updated_at descending

✎ View and edit SQL

This data as json, CSV (advanced)

Suggested facets: created_at (date), updated_at (date)

user 1

  • rabernat · 8 ✖

issue 1

  • groupby on dask objects doesn't handle chunks well · 8 ✖

author_association 1

  • MEMBER 8
id html_url issue_url node_id user created_at updated_at ▲ author_association body reactions performed_via_github_app issue
559166277 https://github.com/pydata/xarray/issues/1832#issuecomment-559166277 https://api.github.com/repos/pydata/xarray/issues/1832 MDEyOklzc3VlQ29tbWVudDU1OTE2NjI3Nw== rabernat 1197350 2019-11-27T16:45:14Z 2019-11-27T16:45:14Z MEMBER

I am trying a new approach to this problem using xarray's new map_blocks function. See this example: https://nbviewer.jupyter.org/gist/rabernat/30e7b747f0e3583b5b776e4093266114

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  groupby on dask objects doesn't handle chunks well 288785270
499645413 https://github.com/pydata/xarray/issues/1832#issuecomment-499645413 https://api.github.com/repos/pydata/xarray/issues/1832 MDEyOklzc3VlQ29tbWVudDQ5OTY0NTQxMw== rabernat 1197350 2019-06-06T20:01:40Z 2019-06-06T20:01:40Z MEMBER

In recent versions of xarray (0.12.1) and dask (0.12.1), this issue has been ameliorated significantly. I believe this issue should now be closed.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  groupby on dask objects doesn't handle chunks well 288785270
363482773 https://github.com/pydata/xarray/issues/1832#issuecomment-363482773 https://api.github.com/repos/pydata/xarray/issues/1832 MDEyOklzc3VlQ29tbWVudDM2MzQ4Mjc3Mw== rabernat 1197350 2018-02-06T16:38:31Z 2018-02-06T16:38:31Z MEMBER

Short answer...no luck. With the latest masters (but without the suggested dask config), I am still getting the same basic performance limitations.

I can update you more when we talk in person later today.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  groupby on dask objects doesn't handle chunks well 288785270
362151486 https://github.com/pydata/xarray/issues/1832#issuecomment-362151486 https://api.github.com/repos/pydata/xarray/issues/1832 MDEyOklzc3VlQ29tbWVudDM2MjE1MTQ4Ng== rabernat 1197350 2018-02-01T03:59:28Z 2018-02-01T03:59:28Z MEMBER

@mrocklin thanks for the updates. I should have some time on Friday morning to give it a try on Cheyenne.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  groupby on dask objects doesn't handle chunks well 288785270
358106768 https://github.com/pydata/xarray/issues/1832#issuecomment-358106768 https://api.github.com/repos/pydata/xarray/issues/1832 MDEyOklzc3VlQ29tbWVudDM1ODEwNjc2OA== rabernat 1197350 2018-01-16T21:10:37Z 2018-01-16T21:10:37Z MEMBER

Or maybe real data just gets in the way of the core dask issue?

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  groupby on dask objects doesn't handle chunks well 288785270
358106627 https://github.com/pydata/xarray/issues/1832#issuecomment-358106627 https://api.github.com/repos/pydata/xarray/issues/1832 MDEyOklzc3VlQ29tbWVudDM1ODEwNjYyNw== rabernat 1197350 2018-01-16T21:10:12Z 2018-01-16T21:10:12Z MEMBER

I am developing a use case for this scenario using real data. I will put the data in cloud storage as soon as #1800 is merged. That should make it easier to debug.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  groupby on dask objects doesn't handle chunks well 288785270
358017392 https://github.com/pydata/xarray/issues/1832#issuecomment-358017392 https://api.github.com/repos/pydata/xarray/issues/1832 MDEyOklzc3VlQ29tbWVudDM1ODAxNzM5Mg== rabernat 1197350 2018-01-16T16:21:37Z 2018-01-16T16:21:37Z MEMBER

Below is how I work around the issue in practice: writing a loop over each item in the groupby, and then looping over each variable, loading, and writing to disk.

python gb = ds.groupby('time.month') for month, dsm in gb: dsm_anom2 = ((dsm - ds_mm.sel(month=month))**2).mean(dim='time') dsm_anom2 = dsm_anom2.rename({f: f + '2' for f in fields}) dsm_anom2.coords['month'] = month for var in dsm_anom2.data_vars: filename = save_dir + '%02d.%s_%s.nc' % (month, prefix, var) print(filename) ds_out = dsm_anom2[[var]].load() ds_out.to_netcdf(filename)

Needless to say, this feels more like my pre-xarray/dask workflow.

Since @mrocklin has made it pretty clear that dask will not automatically solve this for us any time soon, we need to brainstorm some creative ways to make this extremely common use case more friendly with out-of-core data.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  groupby on dask objects doesn't handle chunks well 288785270
358003205 https://github.com/pydata/xarray/issues/1832#issuecomment-358003205 https://api.github.com/repos/pydata/xarray/issues/1832 MDEyOklzc3VlQ29tbWVudDM1ODAwMzIwNQ== rabernat 1197350 2018-01-16T15:41:25Z 2018-01-16T15:41:25Z MEMBER

The operation python ds_anom = ds - ds.mean(dim='time') is also extremely common. Both should work well by default.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  groupby on dask objects doesn't handle chunks well 288785270

Advanced export

JSON shape: default, array, newline-delimited, object

CSV options:

CREATE TABLE [issue_comments] (
   [html_url] TEXT,
   [issue_url] TEXT,
   [id] INTEGER PRIMARY KEY,
   [node_id] TEXT,
   [user] INTEGER REFERENCES [users]([id]),
   [created_at] TEXT,
   [updated_at] TEXT,
   [author_association] TEXT,
   [body] TEXT,
   [reactions] TEXT,
   [performed_via_github_app] TEXT,
   [issue] INTEGER REFERENCES [issues]([id])
);
CREATE INDEX [idx_issue_comments_issue]
    ON [issue_comments] ([issue]);
CREATE INDEX [idx_issue_comments_user]
    ON [issue_comments] ([user]);
Powered by Datasette · Queries took 54.15ms · About: xarray-datasette