home / github / issue_comments

Menu
  • GraphQL API
  • Search all tables

issue_comments: 1137821786

This data as json

html_url issue_url id node_id user created_at updated_at author_association body reactions performed_via_github_app issue
https://github.com/pydata/xarray/issues/6633#issuecomment-1137821786 https://api.github.com/repos/pydata/xarray/issues/6633 1137821786 IC_kwDOAMm_X85D0cha 1197350 2022-05-25T20:34:30Z 2022-05-25T20:34:59Z MEMBER

Here is an example that really highlights the performance cost of always loading dimension coordinates:

python import zarr store = zarr.storage.FSStore("s3://mur-sst/zarr/", anon=True) %time list(zarr.open_consolidated(store)) # -> Wall time: 86.4 ms %time ds = xr.open_dataset(store, engine='zarr') # -> Wall time: 17.1 s

%prun confirms that Xarray is spending most of its time just loading data for the time axis, which you can reproduce at the zarr level as:

python zgroup = zarr.open_consolidated(store) %time _ = zgroup['time'][:] # -> Wall time: 14.7 s

Obviously this example is pretty extreme. There are things that could be done to optimize it, etc. But it really highlights the costs of eagerly loading dimension coordinates. If I don't care about label-based indexing for this dataset, I would rather have my 17s back!

:+1: to "indexes={} (empty dictionary) to explicitly skip creating indexes".

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  1247010680
Powered by Datasette · Queries took 0.638ms · About: xarray-datasette