home / github

Menu
  • Search all tables
  • GraphQL API

issue_comments

Table actions
  • GraphQL API for issue_comments

2 rows where issue = 645443880 sorted by updated_at descending

✎ View and edit SQL

This data as json, CSV (advanced)

Suggested facets: created_at (date), updated_at (date)

user 2

  • shoyer 1
  • snbentley 1

author_association 2

  • MEMBER 1
  • NONE 1

issue 1

  • to_netcdf very slow for some single character data types · 2 ✖
id html_url issue_url node_id user created_at updated_at ▲ author_association body reactions performed_via_github_app issue
650091343 https://github.com/pydata/xarray/issues/4180#issuecomment-650091343 https://api.github.com/repos/pydata/xarray/issues/4180 MDEyOklzc3VlQ29tbWVudDY1MDA5MTM0Mw== snbentley 7360639 2020-06-26T09:45:39Z 2020-06-26T09:45:39Z NONE

Ah that is a much better compromise - it's still slower for my own much larger dataset but is definitely manageable now. I think that this is what I was trying to find originally when I ended up using |S1.

As the problem was my usage of encoding / netCDF4's slow variable strings and you've given me a good workaround, I'll close this. Thanks for your help!

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  to_netcdf very slow for some single character data types 645443880
649883875 https://github.com/pydata/xarray/issues/4180#issuecomment-649883875 https://api.github.com/repos/pydata/xarray/issues/4180 MDEyOklzc3VlQ29tbWVudDY0OTg4Mzg3NQ== shoyer 1217238 2020-06-26T00:31:36Z 2020-06-26T00:31:36Z MEMBER

The profile shows that all the time is spent in the netCDF4 library.

By default, xarray writes string dtypes as variable length strings. That appears to be rather slow in netCDF4, for reasons that aren't clear to me.

One work around is to save the data as fixed-width character data instead, e.g., ds.to_netcdf('somefilename', encoding={'tester': {'dtype': 'S1'}}) Unlike astype('|S1'), this version safely encodes the data as UTF-8, so it an handle arbitrary Python strings.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  to_netcdf very slow for some single character data types 645443880

Advanced export

JSON shape: default, array, newline-delimited, object

CSV options:

CREATE TABLE [issue_comments] (
   [html_url] TEXT,
   [issue_url] TEXT,
   [id] INTEGER PRIMARY KEY,
   [node_id] TEXT,
   [user] INTEGER REFERENCES [users]([id]),
   [created_at] TEXT,
   [updated_at] TEXT,
   [author_association] TEXT,
   [body] TEXT,
   [reactions] TEXT,
   [performed_via_github_app] TEXT,
   [issue] INTEGER REFERENCES [issues]([id])
);
CREATE INDEX [idx_issue_comments_issue]
    ON [issue_comments] ([issue]);
CREATE INDEX [idx_issue_comments_user]
    ON [issue_comments] ([user]);
Powered by Datasette · Queries took 16.818ms · About: xarray-datasette