html_url,issue_url,id,node_id,user,created_at,updated_at,author_association,body,reactions,performed_via_github_app,issue
https://github.com/pydata/xarray/issues/6633#issuecomment-1137851771,https://api.github.com/repos/pydata/xarray/issues/6633,1137851771,IC_kwDOAMm_X85D0j17,1197350,2022-05-25T21:10:44Z,2022-05-25T21:10:44Z,MEMBER,Yes it is definitely a pathological example. 💣 But the fact remains that there are many cases where we just want to discover dataset contents as quickly as possible and want to avoid the cost of loading coordinates and creating indexes.,"{""total_count"": 4, ""+1"": 4, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,1247010680
https://github.com/pydata/xarray/issues/6633#issuecomment-1137821786,https://api.github.com/repos/pydata/xarray/issues/6633,1137821786,IC_kwDOAMm_X85D0cha,1197350,2022-05-25T20:34:30Z,2022-05-25T20:34:59Z,MEMBER,"Here is an example that really highlights the performance cost of always loading dimension coordinates:
```python
import zarr
store = zarr.storage.FSStore(""s3://mur-sst/zarr/"", anon=True)
%time list(zarr.open_consolidated(store)) # -> Wall time: 86.4 ms
%time ds = xr.open_dataset(store, engine='zarr') # -> Wall time: 17.1 s
```
`%prun` confirms that Xarray is spending most of its time just loading data for the `time` axis, which you can reproduce at the zarr level as:
```python
zgroup = zarr.open_consolidated(store)
%time _ = zgroup['time'][:] # -> Wall time: 14.7 s
```
Obviously this example is pretty extreme. There are things that could be done to optimize it, etc. But it really highlights the costs of eagerly loading dimension coordinates. If I don't care about label-based indexing for this dataset, I would rather have my 17s back!
:+1: to ""`indexes={}` (empty dictionary) to explicitly skip creating indexes"".
","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,1247010680