home / github

Menu
  • GraphQL API
  • Search all tables

issue_comments

Table actions
  • GraphQL API for issue_comments

3 rows where issue = 293293632 sorted by updated_at descending

✎ View and edit SQL

This data as json, CSV (advanced)

Suggested facets: created_at (date), updated_at (date)

user 3

  • shoyer 1
  • jhamman 1
  • max-sixty 1

issue 1

  • running out of memory trying to write SQL · 3 ✖

author_association 1

  • MEMBER 3
id html_url issue_url node_id user created_at updated_at ▲ author_association body reactions performed_via_github_app issue
460128002 https://github.com/pydata/xarray/issues/1874#issuecomment-460128002 https://api.github.com/repos/pydata/xarray/issues/1874 MDEyOklzc3VlQ29tbWVudDQ2MDEyODAwMg== jhamman 2443309 2019-02-04T04:29:08Z 2019-02-04T04:29:08Z MEMBER

This can be done using xarray->zarr->SQL (https://github.com/zarr-developers/zarr/pull/368). Additional databases such are also available as stores in zarr.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  running out of memory trying to write SQL 293293632
362945533 https://github.com/pydata/xarray/issues/1874#issuecomment-362945533 https://api.github.com/repos/pydata/xarray/issues/1874 MDEyOklzc3VlQ29tbWVudDM2Mjk0NTUzMw== shoyer 1217238 2018-02-04T22:26:44Z 2018-02-04T22:26:44Z MEMBER

Then I need to write the data to a postgres DB. I have tried parsing the array and using an INSERT for every row, but this is taking a very long time (weeks).

I'm not a particular expert on postgres but I suspect it indeed has some sort of bulk insert facilities.

However, when trying to convert my xarray Dataset to a Pandas Dataframe, I ran out of memory quickly.

If you're working with a 47GB netCDF file, you probably don't have a lot of memory to spare. Often pandas.DataFrame objects can use significantly more memory than xarray.Dataset, especially keeping in mind that an xarray Dataset can lazily reference data on disk but a DataFrame is always in memory. The best strategy is probably to slice the Dataset into small pieces and to individually convert those.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  running out of memory trying to write SQL 293293632
362072512 https://github.com/pydata/xarray/issues/1874#issuecomment-362072512 https://api.github.com/repos/pydata/xarray/issues/1874 MDEyOklzc3VlQ29tbWVudDM2MjA3MjUxMg== max-sixty 5635139 2018-01-31T21:13:49Z 2018-01-31T21:13:49Z MEMBER

There's no xarray->SQL connector, unfortunately.

I don't have that much experience here so I'll let other chime in. You could try chunking to pandas and then to Postgres (but you'll always be limited by memory with pandas). If there's a NetCDF -> tabular connector, that would allow you to operate beyond memory.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  running out of memory trying to write SQL 293293632

Advanced export

JSON shape: default, array, newline-delimited, object

CSV options:

CREATE TABLE [issue_comments] (
   [html_url] TEXT,
   [issue_url] TEXT,
   [id] INTEGER PRIMARY KEY,
   [node_id] TEXT,
   [user] INTEGER REFERENCES [users]([id]),
   [created_at] TEXT,
   [updated_at] TEXT,
   [author_association] TEXT,
   [body] TEXT,
   [reactions] TEXT,
   [performed_via_github_app] TEXT,
   [issue] INTEGER REFERENCES [issues]([id])
);
CREATE INDEX [idx_issue_comments_issue]
    ON [issue_comments] ([issue]);
CREATE INDEX [idx_issue_comments_user]
    ON [issue_comments] ([user]);
Powered by Datasette · Queries took 12.544ms · About: xarray-datasette