home / github

Menu
  • GraphQL API
  • Search all tables

issue_comments

Table actions
  • GraphQL API for issue_comments

2 rows where author_association = "MEMBER", issue = 1083621690 and user = 1217238 sorted by updated_at descending

✎ View and edit SQL

This data as json, CSV (advanced)

Suggested facets: created_at (date), updated_at (date)

user 1

  • shoyer · 2 ✖

issue 1

  • Initialise zarr metadata without computing dask graph · 2 ✖

author_association 1

  • MEMBER · 2 ✖
id html_url issue_url node_id user created_at updated_at ▲ author_association body reactions performed_via_github_app issue
1011450955 https://github.com/pydata/xarray/issues/6084#issuecomment-1011450955 https://api.github.com/repos/pydata/xarray/issues/6084 IC_kwDOAMm_X848SYRL shoyer 1217238 2022-01-12T21:05:59Z 2022-01-12T21:05:59Z MEMBER

E.g., I think skipping this line would save some of the users in my original post a lot of time.

I don't think that line adds any measurable overhead. It's just telling dask to delay computation of a single function.

For sure this would be worth elaborating on in the Xarray docs! I wrote a little bit about this in the docs for Xarray-Beam: see "One recommended pattern" in https://xarray-beam.readthedocs.io/en/latest/read-write.html#writing-data-to-zarr

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Initialise zarr metadata without computing dask graph 1083621690
998357641 https://github.com/pydata/xarray/issues/6084#issuecomment-998357641 https://api.github.com/repos/pydata/xarray/issues/6084 IC_kwDOAMm_X847gbqJ shoyer 1217238 2021-12-21T00:00:49Z 2021-12-21T00:00:49Z MEMBER

The challenge is that Xarray needs some way to represent the "schema" for the desired entire dataset. I'm very open to alternatives, but so far, the most convenient way to do this has been to load Dask arrays into an xarray.Dataset.

It's worth noting that any dask arrays with the desired chunking scheme will do -- you don't need to use the same dask arrays that you want to compute. When I do this sort of thing, I will often use xarray.zeros_like() to create low overhead versions of dask arrays, e.g., in this example from Xarray-Beam: https://github.com/google/xarray-beam/blob/0.2.0/examples/era5_climatology.py#L61-L68

{
    "total_count": 2,
    "+1": 2,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Initialise zarr metadata without computing dask graph 1083621690

Advanced export

JSON shape: default, array, newline-delimited, object

CSV options:

CREATE TABLE [issue_comments] (
   [html_url] TEXT,
   [issue_url] TEXT,
   [id] INTEGER PRIMARY KEY,
   [node_id] TEXT,
   [user] INTEGER REFERENCES [users]([id]),
   [created_at] TEXT,
   [updated_at] TEXT,
   [author_association] TEXT,
   [body] TEXT,
   [reactions] TEXT,
   [performed_via_github_app] TEXT,
   [issue] INTEGER REFERENCES [issues]([id])
);
CREATE INDEX [idx_issue_comments_issue]
    ON [issue_comments] ([issue]);
CREATE INDEX [idx_issue_comments_user]
    ON [issue_comments] ([user]);
Powered by Datasette · Queries took 250.155ms · About: xarray-datasette