home / github

Menu
  • Search all tables
  • GraphQL API

issue_comments

Table actions
  • GraphQL API for issue_comments

3 rows where issue = 323703742 and user = 145117 sorted by updated_at descending

✎ View and edit SQL

This data as json, CSV (advanced)

Suggested facets: reactions, created_at (date), updated_at (date)

user 1

  • mankoff · 3 ✖

issue 1

  • From pandas to xarray without blowing up memory · 3 ✖

author_association 1

  • CONTRIBUTOR 3
id html_url issue_url node_id user created_at updated_at ▲ author_association body reactions performed_via_github_app issue
708594913 https://github.com/pydata/xarray/issues/2139#issuecomment-708594913 https://api.github.com/repos/pydata/xarray/issues/2139 MDEyOklzc3VlQ29tbWVudDcwODU5NDkxMw== mankoff 145117 2020-10-14T18:52:38Z 2020-10-14T18:52:38Z CONTRIBUTOR

The issue is that if you pass in names = ['a','b','c'] to pd.read_csv and there are more columns than names, it takes all the columns without a name and creates a multi-index. That was a bug in my code that I had more columns than names, didn't want a multi-index, and didn't make use of usecols.

This multi-index came from a small 12 MB file - 5000 rows and 40 variables. When I then did df.to_xarray() it filled up my RAM. If I ran the code I provided above, it worked.

Now that I've figured all this out, I don't think that any bugs exist in xarray or pandas, just my code. As usual :). But if the fact that I can fill ram with df.to_xarray() but not with the 3 lines shown above sounds like an issue you want to explore, I'm happy to provide an MWE on a new ticket and tag you there. Let me know...

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  From pandas to xarray without blowing up memory 323703742
708513119 https://github.com/pydata/xarray/issues/2139#issuecomment-708513119 https://api.github.com/repos/pydata/xarray/issues/2139 MDEyOklzc3VlQ29tbWVudDcwODUxMzExOQ== mankoff 145117 2020-10-14T16:23:36Z 2020-10-14T16:23:36Z CONTRIBUTOR

@max-sixty Sorry for posting this here. This memory blow-up was a byproduct of another bug that it took me a few more hours to track down. This other bug is in Pandas, not xarray.

{
    "total_count": 1,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 1,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  From pandas to xarray without blowing up memory 323703742
708339519 https://github.com/pydata/xarray/issues/2139#issuecomment-708339519 https://api.github.com/repos/pydata/xarray/issues/2139 MDEyOklzc3VlQ29tbWVudDcwODMzOTUxOQ== mankoff 145117 2020-10-14T11:25:03Z 2020-10-14T11:25:03Z CONTRIBUTOR

Late reply, but if anyone else finds this issue, I was filling memory with: ds = df.to_xarray(), but if I build the dataset more manually, I have no memory issues:

python ds = xr.Dataset({df.columns[0]: xr.DataArray(data=df[df.columns[0]], dims=['index'], coords={'index':df.index})}) for c in df.columns[1:]: ds[c] = (('index'), df[c])

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  From pandas to xarray without blowing up memory 323703742

Advanced export

JSON shape: default, array, newline-delimited, object

CSV options:

CREATE TABLE [issue_comments] (
   [html_url] TEXT,
   [issue_url] TEXT,
   [id] INTEGER PRIMARY KEY,
   [node_id] TEXT,
   [user] INTEGER REFERENCES [users]([id]),
   [created_at] TEXT,
   [updated_at] TEXT,
   [author_association] TEXT,
   [body] TEXT,
   [reactions] TEXT,
   [performed_via_github_app] TEXT,
   [issue] INTEGER REFERENCES [issues]([id])
);
CREATE INDEX [idx_issue_comments_issue]
    ON [issue_comments] ([issue]);
CREATE INDEX [idx_issue_comments_user]
    ON [issue_comments] ([user]);
Powered by Datasette · Queries took 15.43ms · About: xarray-datasette