issue_comments: 636491064
This data as json
html_url | issue_url | id | node_id | user | created_at | updated_at | author_association | body | reactions | performed_via_github_app | issue |
---|---|---|---|---|---|---|---|---|---|---|---|
https://github.com/pydata/xarray/issues/4113#issuecomment-636491064 | https://api.github.com/repos/pydata/xarray/issues/4113 | 636491064 | MDEyOklzc3VlQ29tbWVudDYzNjQ5MTA2NA== | 36678697 | 2020-05-31T16:04:39Z | 2020-05-31T16:04:39Z | NONE | Thanks for the answer. I tried some experiments with chunked reading with dask, but I have observations I don't fully get : 1) Still loading memoryReading with chunks load the memory more than reading without chunks, but not loading an amount of memory equals to the size of the array (300MB for a 800MB array in the example below). And by the way, also loading up the memory a bit more when stacking. But I think this may be normal, because of something like loading the dask machinery in the memory, and that I will see the full benefits when working on bigger data, am I right? 2) Stacking is breaking the chunksWhen stacking a chunked array, only chunks alongside the first stacking dimension are conserved, and chunks along the second stacking dimension seem to be merged. I think this has something to do with the very nature of indexes, but not sure. 3) Rechunking load the memoryA workaround to 2) could have been to re-chunk as wanted after stacking, but then it is fully loading the data. Example(Considering the following to replace the ```python def main():
``` which outputs something like:
Chunks displayed thanks to the jupyter notebook visualization: Before stacking:
After stacking: A workaround could have been to save the data already stacked, but "MultiIndex cannot yet be serialized to netCDF". Maybe there is another workaround? (Sorry for the long post) |
{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
627735640 |