issue_comments
3 rows where issue = 1223031600 sorted by updated_at descending
This data as json, CSV (advanced)
Suggested facets: reactions, created_at (date), updated_at (date)
issue 1
- Excessive memory consumption by to_dataframe() · 3 ✖
id | html_url | issue_url | node_id | user | created_at | updated_at ▲ | author_association | body | reactions | performed_via_github_app | issue |
---|---|---|---|---|---|---|---|---|---|---|---|
1116350454 | https://github.com/pydata/xarray/issues/6561#issuecomment-1116350454 | https://api.github.com/repos/pydata/xarray/issues/6561 | IC_kwDOAMm_X85Ciif2 | max-sixty 5635139 | 2022-05-03T17:19:23Z | 2022-05-03T17:19:23Z | MEMBER |
I'm not sure it's necessarily poorly constructed — it can be quite useful to structure data like this — having aligned data of different dimensions in a single dataset is great. But the attribute of the data that makes datasets a good format also makes it bad for a single table. Probably what we'd want is |
{ "total_count": 1, "+1": 1, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
Excessive memory consumption by to_dataframe() 1223031600 | |
1116344892 | https://github.com/pydata/xarray/issues/6561#issuecomment-1116344892 | https://api.github.com/repos/pydata/xarray/issues/6561 | IC_kwDOAMm_X85CihI8 | sgdecker 8419421 | 2022-05-03T17:13:02Z | 2022-05-03T17:13:02Z | NONE | Thanks for the feedback and explanation. It seems the poorly constructed netCDF file is fundamentally to blame for triggering this behavior. A warning is a good idea, though. |
{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
Excessive memory consumption by to_dataframe() 1223031600 | |
1115419268 | https://github.com/pydata/xarray/issues/6561#issuecomment-1115419268 | https://api.github.com/repos/pydata/xarray/issues/6561 | IC_kwDOAMm_X85Ce_KE | max-sixty 5635139 | 2022-05-02T22:09:40Z | 2022-05-02T22:09:40Z | MEMBER | Great, thanks for the example @sgdecker . I think this is happening because there are variables of different dimensions that are getting broadcast together: ```python In [5]: ncdata[['lastChild']].to_dataframe() Out[5]: lastChild station 0 127265.0 1 NaN 2 127492.0 3 124019.0 4 NaN ... ... 5016 124375.0 5017 126780.0 5018 126781.0 5019 124902.0 5020 93468.0 [5021 rows x 1 columns] In [6]: ncdata[['lastChild','snowfall_amount']].to_dataframe() Out[6]: lastChild snowfall_amount station recNum 0 0 127265.0 NaN 1 127265.0 NaN 2 127265.0 NaN 3 127265.0 NaN 4 127265.0 NaN ... ... ... 5020 127621 93468.0 NaN 127622 93468.0 NaN 127623 93468.0 NaN 127624 93468.0 NaN 127625 93468.0 NaN [640810146 rows x 2 columns] ```
I'm not sure what we could do here — I don't think there's a way of producing a 2D dataframe without blowing this out? We could offer a warning on this behavior beyond a certain size — we'd take a PR for that... |
{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
Excessive memory consumption by to_dataframe() 1223031600 |
Advanced export
JSON shape: default, array, newline-delimited, object
CREATE TABLE [issue_comments] ( [html_url] TEXT, [issue_url] TEXT, [id] INTEGER PRIMARY KEY, [node_id] TEXT, [user] INTEGER REFERENCES [users]([id]), [created_at] TEXT, [updated_at] TEXT, [author_association] TEXT, [body] TEXT, [reactions] TEXT, [performed_via_github_app] TEXT, [issue] INTEGER REFERENCES [issues]([id]) ); CREATE INDEX [idx_issue_comments_issue] ON [issue_comments] ([issue]); CREATE INDEX [idx_issue_comments_user] ON [issue_comments] ([user]);
user 2