issue_comments: 748554375

This data as json

html_url	issue_url	id	node_id	user	created_at	updated_at	author_association	body	reactions	performed_via_github_app	issue
https://github.com/pydata/xarray/pull/4461#issuecomment-748554375	https://api.github.com/repos/pydata/xarray/issues/4461	748554375	MDEyOklzc3VlQ29tbWVudDc0ODU1NDM3NQ==	7799184	2020-12-20T02:35:40Z	2020-12-20T09:10:27Z	CONTRIBUTOR	@rabernat , awesome! I was stunned by the difference -- I guess the async loading of coordinate data is the big win, right? @rsignell-usgs one other thing that can largely speed up loading of metadata / coordinates is ensuring coordinate variables are stored in one single chunk. For this particular dataset, chunk size for `time` coordinate is 672 yielding 339 chunks, which can take a while to load from remote bucket stores. If you rewrite `time` coordinate setting `dset.time.encoding["chunks"] = (227904,)` you should see a very large performance increase. One thing we have been doing for the cases of zarr archives that are appended in time, is defining time coordinate with a very large chunk size (e.g., `dset.time.encoding["chunks"] = (10000000,)`) when we first write the store. This ensures time coordinate will still fit in one single chunk after appending over time dimension, and does not affect chunking of the actual data variables. One thing we have been having performance issues with is with loading coordinates / metadata from zarr archives that have too many chunks (millions), even when metadata is consolidated and coordinates are in one single chunk. There is an open issue in dask about this.	{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }		709187212