html_url,issue_url,id,node_id,user,created_at,updated_at,author_association,body,reactions,performed_via_github_app,issue
https://github.com/pydata/xarray/pull/4461#issuecomment-748554375,https://api.github.com/repos/pydata/xarray/issues/4461,748554375,MDEyOklzc3VlQ29tbWVudDc0ODU1NDM3NQ==,7799184,2020-12-20T02:35:40Z,2020-12-20T09:10:27Z,CONTRIBUTOR,"> @rabernat , awesome! I was stunned by the difference -- I guess the async loading of coordinate data is the big win, right?
@rsignell-usgs one other thing that can largely speed up loading of metadata / coordinates is ensuring coordinate variables are stored in one single chunk. For this particular dataset, chunk size for `time` coordinate is 672 yielding 339 chunks, which can take a while to load from remote bucket stores. If you rewrite `time` coordinate setting `dset.time.encoding[""chunks""] = (227904,)` you should see a very large performance increase. One thing we have been doing for the cases of zarr archives that are appended in time, is defining time coordinate with a very large chunk size (e.g., `dset.time.encoding[""chunks""] = (10000000,)`) when we first write the store. This ensures time coordinate will still fit in one single chunk after appending over time dimension, and does not affect chunking of the actual data variables.
One thing we have been having performance issues with is with loading coordinates / metadata from zarr archives that have too many chunks (millions), even when metadata is consolidated and coordinates are in one single chunk. There is an [open issue](https://github.com/dask/dask/issues/6363) in dask about this.","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,709187212