home / github

Menu
  • GraphQL API
  • Search all tables

issue_comments

Table actions
  • GraphQL API for issue_comments

2 rows where issue = 233350060 and user = 1217238 sorted by updated_at descending

✎ View and edit SQL

This data as json, CSV (advanced)

Suggested facets: created_at (date), updated_at (date)

user 1

  • shoyer · 2 ✖

issue 1

  • If a NetCDF file is chunked on disk, open it with compatible dask chunks · 2 ✖

author_association 1

  • MEMBER 2
id html_url issue_url node_id user created_at updated_at ▲ author_association body reactions performed_via_github_app issue
306617217 https://github.com/pydata/xarray/issues/1440#issuecomment-306617217 https://api.github.com/repos/pydata/xarray/issues/1440 MDEyOklzc3VlQ29tbWVudDMwNjYxNzIxNw== shoyer 1217238 2017-06-06T21:05:56Z 2017-06-06T21:05:56Z MEMBER

I think its unavoidable that users understand how their data will be processed (e.g., whether operations will be mapped over time or space). But maybe some sort of heuristics (if not a fully automated solution) are possible.

For example, maybe chunks={'time'} (note the set rather than a dict) could indicate "divide me into automatically chosen chunks over the time dimension". It's still explicit about how chunking is being done, but comes closer to expressing the intent rather than the details.

{
    "total_count": 5,
    "+1": 5,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  If a NetCDF file is chunked on disk, open it with compatible dask chunks 233350060
306009664 https://github.com/pydata/xarray/issues/1440#issuecomment-306009664 https://api.github.com/repos/pydata/xarray/issues/1440 MDEyOklzc3VlQ29tbWVudDMwNjAwOTY2NA== shoyer 1217238 2017-06-04T00:28:19Z 2017-06-04T00:28:19Z MEMBER

My main concern is that netCDF4 chunk sizes (e.g., ~10-100KB in that blog post) are often much smaller than well sized dask chunks (10-100MB, per the Dask FAQ).

I do think it would be appropriate to issue a warning if you are making dask chunks that don't line up nicely with chunks on disk to avoid performance issues (in general each chunk on disk should usually end up on only one chunk in dask), but there are lots of options for aggregating to larger chunks and it's hard to choose the best way to do that without knowing how the data will be used.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  If a NetCDF file is chunked on disk, open it with compatible dask chunks 233350060

Advanced export

JSON shape: default, array, newline-delimited, object

CSV options:

CREATE TABLE [issue_comments] (
   [html_url] TEXT,
   [issue_url] TEXT,
   [id] INTEGER PRIMARY KEY,
   [node_id] TEXT,
   [user] INTEGER REFERENCES [users]([id]),
   [created_at] TEXT,
   [updated_at] TEXT,
   [author_association] TEXT,
   [body] TEXT,
   [reactions] TEXT,
   [performed_via_github_app] TEXT,
   [issue] INTEGER REFERENCES [issues]([id])
);
CREATE INDEX [idx_issue_comments_issue]
    ON [issue_comments] ([issue]);
CREATE INDEX [idx_issue_comments_user]
    ON [issue_comments] ([user]);
Powered by Datasette · Queries took 904.043ms · About: xarray-datasette