issue_comments: 636491064

This data as json

html_url	issue_url	id	node_id	user	created_at	updated_at	author_association	body	reactions	performed_via_github_app	issue
https://github.com/pydata/xarray/issues/4113#issuecomment-636491064	https://api.github.com/repos/pydata/xarray/issues/4113	636491064	MDEyOklzc3VlQ29tbWVudDYzNjQ5MTA2NA==	36678697	2020-05-31T16:04:39Z	2020-05-31T16:04:39Z	NONE	Thanks for the answer. I tried some experiments with chunked reading with dask, but I have observations I don't fully get : 1) Still loading memory Reading with chunks load the memory more than reading without chunks, but not loading an amount of memory equals to the size of the array (300MB for a 800MB array in the example below). And by the way, also loading up the memory a bit more when stacking. But I think this may be normal, because of something like loading the dask machinery in the memory, and that I will see the full benefits when working on bigger data, am I right? 2) Stacking is breaking the chunks When stacking a chunked array, only chunks alongside the first stacking dimension are conserved, and chunks along the second stacking dimension seem to be merged. I think this has something to do with the very nature of indexes, but not sure. 3) Rechunking load the memory A workaround to 2) could have been to re-chunk as wanted after stacking, but then it is fully loading the data. Example (Considering the following to replace the `main()` function of the script in the original post.) ```python def main(): `fname = "da.nc" shape = 512, 2048, 100 # 800 MB xr.DataArray( np.random.randn(*shape), dims=("x", "y", "z"), ).to_netcdf(fname) print_ram_state() da = xr.open_dataarray(fname, chunks=dict(x=1, y=1)) print(f" da: {mb(da.nbytes)} MB") print_ram_state() mda = da.stack(px=("x", "y")) print_ram_state() mda = mda.chunk(dict(px=1)) print_ram_state()` ``` which outputs something like: `RAM: 94.52 MB da: 800.0 MB RAM: 398.83 MB RAM: 589.05 MB RAM: 1409.11 MB` Chunks displayed thanks to the jupyter notebook visualization: Before stacking: After stacking: A workaround could have been to save the data already stacked, but "MultiIndex cannot yet be serialized to netCDF". Maybe there is another workaround? (Sorry for the long post)	{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }		627735640