issue_comments
4 rows where issue = 180080354 sorted by updated_at descending
This data as json, CSV (advanced)
Suggested facets: created_at (date), updated_at (date)
issue 1
- Memory error when converting dataset to dataframe · 4 ✖
| id | html_url | issue_url | node_id | user | created_at | updated_at ▲ | author_association | body | reactions | performed_via_github_app | issue |
|---|---|---|---|---|---|---|---|---|---|---|---|
| 832136328 | https://github.com/pydata/xarray/issues/1020#issuecomment-832136328 | https://api.github.com/repos/pydata/xarray/issues/1020 | MDEyOklzc3VlQ29tbWVudDgzMjEzNjMyOA== | shoyer 1217238 | 2021-05-04T18:03:32Z | 2021-05-04T18:03:32Z | MEMBER | @meteoDaniel could you please open thread for discussing your issue? This could be a good use for the GitHub "Discussions" tab :) Including a copy of the "repr" from printing your dataset would help us give more specific guidance. |
{
"total_count": 0,
"+1": 0,
"-1": 0,
"laugh": 0,
"hooray": 0,
"confused": 0,
"heart": 0,
"rocket": 0,
"eyes": 0
} |
Memory error when converting dataset to dataframe 180080354 | |
| 832111396 | https://github.com/pydata/xarray/issues/1020#issuecomment-832111396 | https://api.github.com/repos/pydata/xarray/issues/1020 | MDEyOklzc3VlQ29tbWVudDgzMjExMTM5Ng== | meteoDaniel 27021858 | 2021-05-04T17:24:15Z | 2021-05-04T17:24:15Z | NONE | @shoyer I am having a similar problem. I am reading 80 files with total 8.3 GB . So each files has around 100 MB. If I understand you right: Using mf_dataset on such data is not recommend? So best practive wouold be to loop over the files ? PS: I still tried to use some dask related operations but eachtime I try to access |
{
"total_count": 0,
"+1": 0,
"-1": 0,
"laugh": 0,
"hooray": 0,
"confused": 0,
"heart": 0,
"rocket": 0,
"eyes": 0
} |
Memory error when converting dataset to dataframe 180080354 | |
| 250777441 | https://github.com/pydata/xarray/issues/1020#issuecomment-250777441 | https://api.github.com/repos/pydata/xarray/issues/1020 | MDEyOklzc3VlQ29tbWVudDI1MDc3NzQ0MQ== | ktyle 1961038 | 2016-09-30T15:38:01Z | 2016-09-30T15:38:01Z | NONE | Good to know, and since the system I'm running on has 96 GB of RAM, I think your statement about pandas is correct too, as I also get the memory error when running on a smaller (18GB) dataset. |
{
"total_count": 0,
"+1": 0,
"-1": 0,
"laugh": 0,
"hooray": 0,
"confused": 0,
"heart": 0,
"rocket": 0,
"eyes": 0
} |
Memory error when converting dataset to dataframe 180080354 | |
| 250615133 | https://github.com/pydata/xarray/issues/1020#issuecomment-250615133 | https://api.github.com/repos/pydata/xarray/issues/1020 | MDEyOklzc3VlQ29tbWVudDI1MDYxNTEzMw== | shoyer 1217238 | 2016-09-29T22:53:39Z | 2016-09-29T22:53:39Z | MEMBER | Looking at your dataset: ```
So it's at least 57 GB when decoded as float64. This is probably more RAM than you have on your machine. But also, when xarray writes a dataframe every variable first gets expanded to use all dimensions. So this is something like 5 * 57 GB in memory, and pandas probably needs a memory copy to create the DataFrame, so this probably needs at least 500 GB. You'll have better luck subsetting the dataset first. |
{
"total_count": 0,
"+1": 0,
"-1": 0,
"laugh": 0,
"hooray": 0,
"confused": 0,
"heart": 0,
"rocket": 0,
"eyes": 0
} |
Memory error when converting dataset to dataframe 180080354 |
Advanced export
JSON shape: default, array, newline-delimited, object
CREATE TABLE [issue_comments] (
[html_url] TEXT,
[issue_url] TEXT,
[id] INTEGER PRIMARY KEY,
[node_id] TEXT,
[user] INTEGER REFERENCES [users]([id]),
[created_at] TEXT,
[updated_at] TEXT,
[author_association] TEXT,
[body] TEXT,
[reactions] TEXT,
[performed_via_github_app] TEXT,
[issue] INTEGER REFERENCES [issues]([id])
);
CREATE INDEX [idx_issue_comments_issue]
ON [issue_comments] ([issue]);
CREATE INDEX [idx_issue_comments_user]
ON [issue_comments] ([user]);
user 3