home / github

Menu
  • Search all tables
  • GraphQL API

issue_comments

Table actions
  • GraphQL API for issue_comments

2 rows where issue = 334633212 and user = 2443309 sorted by updated_at descending

✎ View and edit SQL

This data as json, CSV (advanced)

Suggested facets: created_at (date)

user 1

  • jhamman · 2 ✖

issue 1

  • to_netcdf(compute=False) can be slow · 2 ✖

author_association 1

  • MEMBER 2
id html_url issue_url node_id user created_at updated_at ▲ author_association body reactions performed_via_github_app issue
453866106 https://github.com/pydata/xarray/issues/2242#issuecomment-453866106 https://api.github.com/repos/pydata/xarray/issues/2242 MDEyOklzc3VlQ29tbWVudDQ1Mzg2NjEwNg== jhamman 2443309 2019-01-13T21:13:28Z 2019-01-13T21:13:28Z MEMBER

I just reran the example above and things seem to be resolved now. The write step for the two datasets is basically identical.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  to_netcdf(compute=False) can be slow 334633212
399320127 https://github.com/pydata/xarray/issues/2242#issuecomment-399320127 https://api.github.com/repos/pydata/xarray/issues/2242 MDEyOklzc3VlQ29tbWVudDM5OTMyMDEyNw== jhamman 2443309 2018-06-22T04:51:54Z 2018-06-22T04:51:54Z MEMBER

I think, at least to some extent, the performance hit is to be expected. I don't think we should be opening the file more than once when using the serial or threaded schedulers so that may be a place where you can find some improvement. There will always be a performance hit when writing dask arrays to netcdf files chunk-by-chunk. For 1, there is a threading lock that limits parallel throughput. More importantly, the chunked writes are going to always be slower than larger reads coming directly from numpy arrays.

In your example above, the snippit @shoyer mentions should evaluate to autoclose=False. However, the profiling you mention seems to indicate the opposite. Perhaps we should start by digging deeper on that point.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  to_netcdf(compute=False) can be slow 334633212

Advanced export

JSON shape: default, array, newline-delimited, object

CSV options:

CREATE TABLE [issue_comments] (
   [html_url] TEXT,
   [issue_url] TEXT,
   [id] INTEGER PRIMARY KEY,
   [node_id] TEXT,
   [user] INTEGER REFERENCES [users]([id]),
   [created_at] TEXT,
   [updated_at] TEXT,
   [author_association] TEXT,
   [body] TEXT,
   [reactions] TEXT,
   [performed_via_github_app] TEXT,
   [issue] INTEGER REFERENCES [issues]([id])
);
CREATE INDEX [idx_issue_comments_issue]
    ON [issue_comments] ([issue]);
CREATE INDEX [idx_issue_comments_user]
    ON [issue_comments] ([user]);
Powered by Datasette · Queries took 2638.532ms · About: xarray-datasette