home / github

Menu
  • GraphQL API
  • Search all tables

issue_comments

Table actions
  • GraphQL API for issue_comments

2 rows where author_association = "MEMBER" and issue = 462859457 sorted by updated_at descending

✎ View and edit SQL

This data as json, CSV (advanced)

Suggested facets: created_at (date), updated_at (date)

user 1

  • shoyer 2

issue 1

  • Multidimensional dask coordinates unexpectedly computed · 2 ✖

author_association 1

  • MEMBER · 2 ✖
id html_url issue_url node_id user created_at updated_at ▲ author_association body reactions performed_via_github_app issue
508853564 https://github.com/pydata/xarray/issues/3068#issuecomment-508853564 https://api.github.com/repos/pydata/xarray/issues/3068 MDEyOklzc3VlQ29tbWVudDUwODg1MzU2NA== shoyer 1217238 2019-07-05T20:15:27Z 2019-07-05T20:15:27Z MEMBER

For the long term, I also understand that there isn't really a good way to check equality of two dask arrays. I wonder if dask's graph optimization could be used to "simplify" two dask arrays' graph separately and check the graph equality. For example, two dask arrays created by doing da.zeros((10, 10), chunks=2) + 5 should be theoretically equal because their dask graphs are made up of the same tasks.

Dask actually already does this canonicalization. If two arrays have the same name, they use the same dask graph, e.g., ``` In [5]: x = da.zeros((10, 10), chunks=2) + 5

In [6]: y = da.zeros((10, 10), chunks=2) + 5

In [7]: x.name Out[7]: 'add-f7441a0f46f5cf40458391cd08406c23'

In [8]: y.name Out[8]: 'add-f7441a0f46f5cf40458391cd08406c23' ```

So xarray could safely look at .name on dask arrays (e.g., inside Variable.equals or duck_array_ops.array_equiv) for determining that two dask arrays are the same, rather than merely using is to check if they are the same objects.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Multidimensional dask coordinates unexpectedly computed 462859457
507480052 https://github.com/pydata/xarray/issues/3068#issuecomment-507480052 https://api.github.com/repos/pydata/xarray/issues/3068 MDEyOklzc3VlQ29tbWVudDUwNzQ4MDA1Mg== shoyer 1217238 2019-07-02T01:15:36Z 2019-07-02T01:15:36Z MEMBER

The source of the problem here is that when combining objects, xarray needs to decide what coordinates should remain. Our current heuristic, which pre-dates dask support, was really designed for array in memory: we keep around coordinates if they are equal on both arguments, and remove them otherwise. In some cases we can avoid the computation, if we know that the coordinates are the same object.

I am open to ideas on how to make this work better.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Multidimensional dask coordinates unexpectedly computed 462859457

Advanced export

JSON shape: default, array, newline-delimited, object

CSV options:

CREATE TABLE [issue_comments] (
   [html_url] TEXT,
   [issue_url] TEXT,
   [id] INTEGER PRIMARY KEY,
   [node_id] TEXT,
   [user] INTEGER REFERENCES [users]([id]),
   [created_at] TEXT,
   [updated_at] TEXT,
   [author_association] TEXT,
   [body] TEXT,
   [reactions] TEXT,
   [performed_via_github_app] TEXT,
   [issue] INTEGER REFERENCES [issues]([id])
);
CREATE INDEX [idx_issue_comments_issue]
    ON [issue_comments] ([issue]);
CREATE INDEX [idx_issue_comments_user]
    ON [issue_comments] ([user]);
Powered by Datasette · Queries took 14.095ms · About: xarray-datasette