issue_comments: 408894639

This data as json

html_url	issue_url	id	node_id	user	created_at	updated_at	author_association	body	reactions	performed_via_github_app	issue
https://github.com/pydata/xarray/issues/2329#issuecomment-408894639	https://api.github.com/repos/pydata/xarray/issues/2329	408894639	MDEyOklzc3VlQ29tbWVudDQwODg5NDYzOQ==	12278765	2018-07-30T15:01:27Z	2018-07-30T15:10:43Z	NONE	@rabernat Thanks for your answer. I have one big NetCDF of ~500GB. What I have changed: - Run in a Jupyter notebook with distributed to get the dashboard - Change the chunks to `{'lat': 90, 'lon': 90}`. That should be around 1GB per chunk. - Chunk from the beginning with `ds = xr.open_dataset('my_netcdf.nc', chunks=chunks)` - About the LZ4 compression, I did some test with a 1.5GB extract and the writing time was just 2% slower than uncompressed. Now when I run `to_zarr()`, it creates a zarr store (~40kB) and all the workers start to read the disk, but they don't write anything. The Dask dashboard looks like this: After a while I get warnings: distributed.worker - WARNING - Memory use is high but worker has no data to store to disk. Perhaps some other process is leaking memory? Process memory: 1.55 GB -- Worker memory limit: 2.08 GB Is this the expected behaviour? I was somehow expecting that each worker will read a chunk and then write it to zarr, streamlined. This does not seem to be the case.	{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }		345715825