home / github

Menu
  • Search all tables
  • GraphQL API

issue_comments

Table actions
  • GraphQL API for issue_comments

3 rows where author_association = "MEMBER", issue = 435535284 and user = 1217238 sorted by updated_at descending

✎ View and edit SQL

This data as json, CSV (advanced)

Suggested facets: reactions, created_at (date), updated_at (date)

user 1

  • shoyer · 3 ✖

issue 1

  • Writing a netCDF file is unexpectedly slow · 3 ✖

author_association 1

  • MEMBER · 3 ✖
id html_url issue_url node_id user created_at updated_at ▲ author_association body reactions performed_via_github_app issue
534869060 https://github.com/pydata/xarray/issues/2912#issuecomment-534869060 https://api.github.com/repos/pydata/xarray/issues/2912 MDEyOklzc3VlQ29tbWVudDUzNDg2OTA2MA== shoyer 1217238 2019-09-25T06:08:43Z 2019-09-25T06:08:43Z MEMBER

I suspect it could work pretty well to explicitly rechunk your dataset into larger chunks (e.g., with the Dataset.chunk() method). This way you could continue to use dask for lazy writes, but reduce the overhead of writing individual chunks.

{
    "total_count": 1,
    "+1": 1,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Writing a netCDF file is unexpectedly slow 435535284
485465687 https://github.com/pydata/xarray/issues/2912#issuecomment-485465687 https://api.github.com/repos/pydata/xarray/issues/2912 MDEyOklzc3VlQ29tbWVudDQ4NTQ2NTY4Nw== shoyer 1217238 2019-04-22T16:23:44Z 2019-04-22T16:23:44Z MEMBER

It really depends on the underlying cause. In most cases, writing a file to disk is not the slow part, only the place where the slow-down is manifested.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Writing a netCDF file is unexpectedly slow 435535284
485460901 https://github.com/pydata/xarray/issues/2912#issuecomment-485460901 https://api.github.com/repos/pydata/xarray/issues/2912 MDEyOklzc3VlQ29tbWVudDQ4NTQ2MDkwMQ== shoyer 1217238 2019-04-22T16:06:50Z 2019-04-22T16:06:50Z MEMBER

You're using dask, so the Dataset is being lazily computed. If one part of your pipeline is very expensive (perhaps reading the original data from disk?) then the process of saving can be very slow.

I would suggest doing some profiling, e.g., as shown in this example: http://docs.dask.org/en/latest/diagnostics-local.html#example

Once we know what the slow part is, that will hopefully make opportunities for improvement more obvious.

{
    "total_count": 1,
    "+1": 1,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Writing a netCDF file is unexpectedly slow 435535284

Advanced export

JSON shape: default, array, newline-delimited, object

CSV options:

CREATE TABLE [issue_comments] (
   [html_url] TEXT,
   [issue_url] TEXT,
   [id] INTEGER PRIMARY KEY,
   [node_id] TEXT,
   [user] INTEGER REFERENCES [users]([id]),
   [created_at] TEXT,
   [updated_at] TEXT,
   [author_association] TEXT,
   [body] TEXT,
   [reactions] TEXT,
   [performed_via_github_app] TEXT,
   [issue] INTEGER REFERENCES [issues]([id])
);
CREATE INDEX [idx_issue_comments_issue]
    ON [issue_comments] ([issue]);
CREATE INDEX [idx_issue_comments_user]
    ON [issue_comments] ([user]);
Powered by Datasette · Queries took 193.412ms · About: xarray-datasette