home / github

Menu
  • Search all tables
  • GraphQL API

issue_comments

Table actions
  • GraphQL API for issue_comments

6 rows where author_association = "MEMBER" and issue = 425320466 sorted by updated_at descending

✎ View and edit SQL

This data as json, CSV (advanced)

Suggested facets: reactions, created_at (date), updated_at (date)

user 3

  • rabernat 2
  • shoyer 2
  • dcherian 2

issue 1

  • Allow grouping by dask variables · 6 ✖

author_association 1

  • MEMBER · 6 ✖
id html_url issue_url node_id user created_at updated_at ▲ author_association body reactions performed_via_github_app issue
1101512178 https://github.com/pydata/xarray/issues/2852#issuecomment-1101512178 https://api.github.com/repos/pydata/xarray/issues/2852 IC_kwDOAMm_X85Bp73y dcherian 2448579 2022-04-18T15:45:41Z 2022-04-18T15:45:41Z MEMBER

You can do this with flox now. Eventually we can update xarray to support grouping by a dask variable.

The limitation will be that the user will have to provide "expected groups" so that we can construct the output coordinate.

{
    "total_count": 2,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 2,
    "rocket": 0,
    "eyes": 0
}
  Allow grouping by dask variables 425320466
653016746 https://github.com/pydata/xarray/issues/2852#issuecomment-653016746 https://api.github.com/repos/pydata/xarray/issues/2852 MDEyOklzc3VlQ29tbWVudDY1MzAxNjc0Ng== rabernat 1197350 2020-07-02T13:48:39Z 2020-07-02T13:48:39Z MEMBER

👀 cc @chiaral

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Allow grouping by dask variables 425320466
478621867 https://github.com/pydata/xarray/issues/2852#issuecomment-478621867 https://api.github.com/repos/pydata/xarray/issues/2852 MDEyOklzc3VlQ29tbWVudDQ3ODYyMTg2Nw== shoyer 1217238 2019-04-01T15:16:30Z 2019-04-01T15:16:30Z MEMBER

Roughly how many unique labels do you have?

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Allow grouping by dask variables 425320466
478563375 https://github.com/pydata/xarray/issues/2852#issuecomment-478563375 https://api.github.com/repos/pydata/xarray/issues/2852 MDEyOklzc3VlQ29tbWVudDQ3ODU2MzM3NQ== dcherian 2448579 2019-04-01T12:43:03Z 2019-04-01T12:43:03Z MEMBER

It sounds like there is an apply_ufunc solution to your problem but I dont know how to write it! ;)

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Allow grouping by dask variables 425320466
478415169 https://github.com/pydata/xarray/issues/2852#issuecomment-478415169 https://api.github.com/repos/pydata/xarray/issues/2852 MDEyOklzc3VlQ29tbWVudDQ3ODQxNTE2OQ== shoyer 1217238 2019-04-01T02:31:58Z 2019-04-01T02:31:58Z MEMBER

The current design of GroupBy.apply() in xarray is entirely ignorant of dask: it simply uses a for loop over the grouped variable to built up a computation with high level array operations.

This makes operations that group over large keys stored in dask inefficient. This could be done efficiently (dask.dataframe does this, and might be worth trying in your case) but it's a more challenging distributed computing problem, and xarray's current data model would not know how large of a dimension to create for the returned ararys (doing this properly would require supporting arrays with unknown dimension sizes).

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Allow grouping by dask variables 425320466
476678007 https://github.com/pydata/xarray/issues/2852#issuecomment-476678007 https://api.github.com/repos/pydata/xarray/issues/2852 MDEyOklzc3VlQ29tbWVudDQ3NjY3ODAwNw== rabernat 1197350 2019-03-26T14:41:59Z 2019-03-26T14:41:59Z MEMBER

label (y, x) uint16 dask.array<shape=(10980, 10980), chunksize=(200, 10980)> ... geoms_ds.groupby('label')`

It is very hard to make this sort of groupby lazy, because you are grouping over the variable label itself. Groupby uses a split-apply-combine paradigm to transform the data. The apply and combine steps can be lazy. But the split step cannot. Xarray uses the group variable to determine how to index the array, i.e. which items belong in which group. To do this, it needs to read the whole variable into memory.

In this specific example, it sounds like what you want is to compute the histogram of labels. That could be accomplished without groupby. For example, you could use apply_ufunc together with dask.array.histogram.

So my recommendation is to think of a way to accomplish what you want that does not involve groupby.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Allow grouping by dask variables 425320466

Advanced export

JSON shape: default, array, newline-delimited, object

CSV options:

CREATE TABLE [issue_comments] (
   [html_url] TEXT,
   [issue_url] TEXT,
   [id] INTEGER PRIMARY KEY,
   [node_id] TEXT,
   [user] INTEGER REFERENCES [users]([id]),
   [created_at] TEXT,
   [updated_at] TEXT,
   [author_association] TEXT,
   [body] TEXT,
   [reactions] TEXT,
   [performed_via_github_app] TEXT,
   [issue] INTEGER REFERENCES [issues]([id])
);
CREATE INDEX [idx_issue_comments_issue]
    ON [issue_comments] ([issue]);
CREATE INDEX [idx_issue_comments_user]
    ON [issue_comments] ([user]);
Powered by Datasette · Queries took 135.285ms · About: xarray-datasette