issue_comments
2 rows where issue = 627735640 and user = 36678697 sorted by updated_at descending
This data as json, CSV (advanced)
Suggested facets: created_at (date), updated_at (date)
issue 1
- xarray.DataArray.stack load data into memory · 2 ✖
id | html_url | issue_url | node_id | user | created_at | updated_at ▲ | author_association | body | reactions | performed_via_github_app | issue |
---|---|---|---|---|---|---|---|---|---|---|---|
636849900 | https://github.com/pydata/xarray/issues/4113#issuecomment-636849900 | https://api.github.com/repos/pydata/xarray/issues/4113 | MDEyOklzc3VlQ29tbWVudDYzNjg0OTkwMA== | Paul-Aime 36678697 | 2020-06-01T13:06:02Z | 2020-06-01T13:06:02Z | NONE |
Yes, I'm not very familiar with chunks, it seems that it's not good to have too many of them.
Sorry it should have been
Yes, after some more experiments I found out that the second chunksize after stacking is The formula for X is something like:
So, minimum value for X is That's why I was saying that "chunks along the second stacking dimension seem to be merged". This might be normal, just unexpected, and still quite obscure for me. And it must be happening on dask side anyway. Thanks a lot for your insights.
Ah yes, thanks! I thought |
{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
xarray.DataArray.stack load data into memory 627735640 | |
636491064 | https://github.com/pydata/xarray/issues/4113#issuecomment-636491064 | https://api.github.com/repos/pydata/xarray/issues/4113 | MDEyOklzc3VlQ29tbWVudDYzNjQ5MTA2NA== | Paul-Aime 36678697 | 2020-05-31T16:04:39Z | 2020-05-31T16:04:39Z | NONE | Thanks for the answer. I tried some experiments with chunked reading with dask, but I have observations I don't fully get : 1) Still loading memoryReading with chunks load the memory more than reading without chunks, but not loading an amount of memory equals to the size of the array (300MB for a 800MB array in the example below). And by the way, also loading up the memory a bit more when stacking. But I think this may be normal, because of something like loading the dask machinery in the memory, and that I will see the full benefits when working on bigger data, am I right? 2) Stacking is breaking the chunksWhen stacking a chunked array, only chunks alongside the first stacking dimension are conserved, and chunks along the second stacking dimension seem to be merged. I think this has something to do with the very nature of indexes, but not sure. 3) Rechunking load the memoryA workaround to 2) could have been to re-chunk as wanted after stacking, but then it is fully loading the data. Example(Considering the following to replace the ```python def main():
``` which outputs something like:
Chunks displayed thanks to the jupyter notebook visualization: Before stacking:
After stacking: A workaround could have been to save the data already stacked, but "MultiIndex cannot yet be serialized to netCDF". Maybe there is another workaround? (Sorry for the long post) |
{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
xarray.DataArray.stack load data into memory 627735640 |
Advanced export
JSON shape: default, array, newline-delimited, object
CREATE TABLE [issue_comments] ( [html_url] TEXT, [issue_url] TEXT, [id] INTEGER PRIMARY KEY, [node_id] TEXT, [user] INTEGER REFERENCES [users]([id]), [created_at] TEXT, [updated_at] TEXT, [author_association] TEXT, [body] TEXT, [reactions] TEXT, [performed_via_github_app] TEXT, [issue] INTEGER REFERENCES [issues]([id]) ); CREATE INDEX [idx_issue_comments_issue] ON [issue_comments] ([issue]); CREATE INDEX [idx_issue_comments_user] ON [issue_comments] ([user]);
user 1