home / github

Menu
  • GraphQL API
  • Search all tables

issue_comments

Table actions
  • GraphQL API for issue_comments

3 rows where author_association = "MEMBER", issue = 252358450 and user = 306380 sorted by updated_at descending

✎ View and edit SQL

This data as json, CSV (advanced)

Suggested facets: created_at (date), updated_at (date)

user 1

  • mrocklin · 3 ✖

issue 1

  • Automatic parallelization for dask arrays in apply_ufunc · 3 ✖

author_association 1

  • MEMBER · 3 ✖
id html_url issue_url node_id user created_at updated_at ▲ author_association body reactions performed_via_github_app issue
330701921 https://github.com/pydata/xarray/pull/1517#issuecomment-330701921 https://api.github.com/repos/pydata/xarray/issues/1517 MDEyOklzc3VlQ29tbWVudDMzMDcwMTkyMQ== mrocklin 306380 2017-09-19T23:27:49Z 2017-09-19T23:27:49Z MEMBER

The heuristics we have are I think just of the form "did you make way more chunks than you had previously". I can imagine other heuristics of the form "some of your new chunks are several times larger than your previous chunks". In general these heuristics might be useful in several places. It might make sense to build them in a dask/array/utils.py file.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Automatic parallelization for dask arrays in apply_ufunc 252358450
324732814 https://github.com/pydata/xarray/pull/1517#issuecomment-324732814 https://api.github.com/repos/pydata/xarray/issues/1517 MDEyOklzc3VlQ29tbWVudDMyNDczMjgxNA== mrocklin 306380 2017-08-24T19:25:32Z 2017-08-24T19:25:32Z MEMBER

Yes if you don't care strongly about deduplication. The following will be slower:

b = (a.chunk(...) + 1) + (a.chunk(...) + 1)

In current operation this will be optimized to

tmp = a.chunk(...) + 1
b = tmp + tmp

So you'll lose that, but I suspect that in your case chunking the same dataset many times is somewhat rare.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Automatic parallelization for dask arrays in apply_ufunc 252358450
324722153 https://github.com/pydata/xarray/pull/1517#issuecomment-324722153 https://api.github.com/repos/pydata/xarray/issues/1517 MDEyOklzc3VlQ29tbWVudDMyNDcyMjE1Mw== mrocklin 306380 2017-08-24T18:43:30Z 2017-08-24T18:43:30Z MEMBER

I'm curious, how long does this line take:

r = spearman_correlation(array1.chunk({'place': 10}), array2.chunk({'place': 10}), 'time')

Have you consider setting name=False in your from_array call by default when doing this? I often avoid creating deterministic names when going back and forth rapidly between dask.array and numpy.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Automatic parallelization for dask arrays in apply_ufunc 252358450

Advanced export

JSON shape: default, array, newline-delimited, object

CSV options:

CREATE TABLE [issue_comments] (
   [html_url] TEXT,
   [issue_url] TEXT,
   [id] INTEGER PRIMARY KEY,
   [node_id] TEXT,
   [user] INTEGER REFERENCES [users]([id]),
   [created_at] TEXT,
   [updated_at] TEXT,
   [author_association] TEXT,
   [body] TEXT,
   [reactions] TEXT,
   [performed_via_github_app] TEXT,
   [issue] INTEGER REFERENCES [issues]([id])
);
CREATE INDEX [idx_issue_comments_issue]
    ON [issue_comments] ([issue]);
CREATE INDEX [idx_issue_comments_user]
    ON [issue_comments] ([user]);
Powered by Datasette · Queries took 20.242ms · About: xarray-datasette