home / github

Menu
  • Search all tables
  • GraphQL API

issue_comments

Table actions
  • GraphQL API for issue_comments

2 rows where author_association = "MEMBER", issue = 323703742 and user = 1217238 sorted by updated_at descending

✎ View and edit SQL

This data as json, CSV (advanced)

Suggested facets: created_at (date), updated_at (date)

user 1

  • shoyer · 2 ✖

issue 1

  • From pandas to xarray without blowing up memory · 2 ✖

author_association 1

  • MEMBER · 2 ✖
id html_url issue_url node_id user created_at updated_at ▲ author_association body reactions performed_via_github_app issue
389620638 https://github.com/pydata/xarray/issues/2139#issuecomment-389620638 https://api.github.com/repos/pydata/xarray/issues/2139 MDEyOklzc3VlQ29tbWVudDM4OTYyMDYzOA== shoyer 1217238 2018-05-16T18:31:35Z 2018-05-16T18:31:35Z MEMBER

MetaCSV looks interesting but I haven't used it myself. My guess would be that it just wraps pandas/xarray for processing data, so I think it's unlikely to give a performance boost. It's more about a declarative way to specify how to load a CSV into pandas/xarray.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  From pandas to xarray without blowing up memory 323703742
389598338 https://github.com/pydata/xarray/issues/2139#issuecomment-389598338 https://api.github.com/repos/pydata/xarray/issues/2139 MDEyOklzc3VlQ29tbWVudDM4OTU5ODMzOA== shoyer 1217238 2018-05-16T17:20:03Z 2018-05-16T17:20:03Z MEMBER

If you don't want the full Cartesian product, you need to ensure that the index only contains the variables you want to expand into a grid, e.g., time, lat and lon.

If the problem is only running out of memory (which is indeed likely with 1e9 rows), then you'll need to think about a more clever way to convert the data. One good option might be to groups over subsets of the data (using dask or another parallel processing library like spark or beam), and write a bunch of smaller netCDF which you then open with xarray's open_mfdataset(). It's probably most convenient to split over time, e.g., into files for each day or month.

{
    "total_count": 1,
    "+1": 1,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  From pandas to xarray without blowing up memory 323703742

Advanced export

JSON shape: default, array, newline-delimited, object

CSV options:

CREATE TABLE [issue_comments] (
   [html_url] TEXT,
   [issue_url] TEXT,
   [id] INTEGER PRIMARY KEY,
   [node_id] TEXT,
   [user] INTEGER REFERENCES [users]([id]),
   [created_at] TEXT,
   [updated_at] TEXT,
   [author_association] TEXT,
   [body] TEXT,
   [reactions] TEXT,
   [performed_via_github_app] TEXT,
   [issue] INTEGER REFERENCES [issues]([id])
);
CREATE INDEX [idx_issue_comments_issue]
    ON [issue_comments] ([issue]);
CREATE INDEX [idx_issue_comments_user]
    ON [issue_comments] ([user]);
Powered by Datasette · Queries took 189.767ms · About: xarray-datasette