home / github

Menu
  • Search all tables
  • GraphQL API

issue_comments

Table actions
  • GraphQL API for issue_comments

2 rows where issue = 1247010680 and user = 1197350 sorted by updated_at descending

✎ View and edit SQL

This data as json, CSV (advanced)

Suggested facets: created_at (date), updated_at (date)

user 1

  • rabernat · 2 ✖

issue 1

  • Opening dataset without loading any indexes? · 2 ✖

author_association 1

  • MEMBER 2
id html_url issue_url node_id user created_at updated_at ▲ author_association body reactions performed_via_github_app issue
1137851771 https://github.com/pydata/xarray/issues/6633#issuecomment-1137851771 https://api.github.com/repos/pydata/xarray/issues/6633 IC_kwDOAMm_X85D0j17 rabernat 1197350 2022-05-25T21:10:44Z 2022-05-25T21:10:44Z MEMBER

Yes it is definitely a pathological example. 💣 But the fact remains that there are many cases where we just want to discover dataset contents as quickly as possible and want to avoid the cost of loading coordinates and creating indexes.

{
    "total_count": 4,
    "+1": 4,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Opening dataset without loading any indexes? 1247010680
1137821786 https://github.com/pydata/xarray/issues/6633#issuecomment-1137821786 https://api.github.com/repos/pydata/xarray/issues/6633 IC_kwDOAMm_X85D0cha rabernat 1197350 2022-05-25T20:34:30Z 2022-05-25T20:34:59Z MEMBER

Here is an example that really highlights the performance cost of always loading dimension coordinates:

python import zarr store = zarr.storage.FSStore("s3://mur-sst/zarr/", anon=True) %time list(zarr.open_consolidated(store)) # -> Wall time: 86.4 ms %time ds = xr.open_dataset(store, engine='zarr') # -> Wall time: 17.1 s

%prun confirms that Xarray is spending most of its time just loading data for the time axis, which you can reproduce at the zarr level as:

python zgroup = zarr.open_consolidated(store) %time _ = zgroup['time'][:] # -> Wall time: 14.7 s

Obviously this example is pretty extreme. There are things that could be done to optimize it, etc. But it really highlights the costs of eagerly loading dimension coordinates. If I don't care about label-based indexing for this dataset, I would rather have my 17s back!

:+1: to "indexes={} (empty dictionary) to explicitly skip creating indexes".

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Opening dataset without loading any indexes? 1247010680

Advanced export

JSON shape: default, array, newline-delimited, object

CSV options:

CREATE TABLE [issue_comments] (
   [html_url] TEXT,
   [issue_url] TEXT,
   [id] INTEGER PRIMARY KEY,
   [node_id] TEXT,
   [user] INTEGER REFERENCES [users]([id]),
   [created_at] TEXT,
   [updated_at] TEXT,
   [author_association] TEXT,
   [body] TEXT,
   [reactions] TEXT,
   [performed_via_github_app] TEXT,
   [issue] INTEGER REFERENCES [issues]([id])
);
CREATE INDEX [idx_issue_comments_issue]
    ON [issue_comments] ([issue]);
CREATE INDEX [idx_issue_comments_user]
    ON [issue_comments] ([user]);
Powered by Datasette · Queries took 64.794ms · About: xarray-datasette