home / github / issue_comments

Menu
  • GraphQL API
  • Search all tables

issue_comments: 748554375

This data as json

html_url issue_url id node_id user created_at updated_at author_association body reactions performed_via_github_app issue
https://github.com/pydata/xarray/pull/4461#issuecomment-748554375 https://api.github.com/repos/pydata/xarray/issues/4461 748554375 MDEyOklzc3VlQ29tbWVudDc0ODU1NDM3NQ== 7799184 2020-12-20T02:35:40Z 2020-12-20T09:10:27Z CONTRIBUTOR

@rabernat , awesome! I was stunned by the difference -- I guess the async loading of coordinate data is the big win, right?

@rsignell-usgs one other thing that can largely speed up loading of metadata / coordinates is ensuring coordinate variables are stored in one single chunk. For this particular dataset, chunk size for time coordinate is 672 yielding 339 chunks, which can take a while to load from remote bucket stores. If you rewrite time coordinate setting dset.time.encoding["chunks"] = (227904,) you should see a very large performance increase. One thing we have been doing for the cases of zarr archives that are appended in time, is defining time coordinate with a very large chunk size (e.g., dset.time.encoding["chunks"] = (10000000,)) when we first write the store. This ensures time coordinate will still fit in one single chunk after appending over time dimension, and does not affect chunking of the actual data variables.

One thing we have been having performance issues with is with loading coordinates / metadata from zarr archives that have too many chunks (millions), even when metadata is consolidated and coordinates are in one single chunk. There is an open issue in dask about this.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  709187212
Powered by Datasette · Queries took 0.72ms · About: xarray-datasette