home / github

Menu
  • Search all tables
  • GraphQL API

issue_comments

Table actions
  • GraphQL API for issue_comments

4 rows where issue = 1083621690 sorted by updated_at descending

✖
✖

✎ View and edit SQL

This data as json, CSV (advanced)

Suggested facets: reactions, created_at (date), updated_at (date)

user 3

  • shoyer 2
  • dcherian 1
  • dougiesquire 1

author_association 2

  • MEMBER 3
  • NONE 1

issue 1

  • Initialise zarr metadata without computing dask graph · 4 ✖
id html_url issue_url node_id user created_at updated_at ▲ author_association body reactions performed_via_github_app issue
1011450955 https://github.com/pydata/xarray/issues/6084#issuecomment-1011450955 https://api.github.com/repos/pydata/xarray/issues/6084 IC_kwDOAMm_X848SYRL shoyer 1217238 2022-01-12T21:05:59Z 2022-01-12T21:05:59Z MEMBER

E.g., I think skipping this line would save some of the users in my original post a lot of time.

I don't think that line adds any measurable overhead. It's just telling dask to delay computation of a single function.

For sure this would be worth elaborating on in the Xarray docs! I wrote a little bit about this in the docs for Xarray-Beam: see "One recommended pattern" in https://xarray-beam.readthedocs.io/en/latest/read-write.html#writing-data-to-zarr

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Initialise zarr metadata without computing dask graph 1083621690
1011430150 https://github.com/pydata/xarray/issues/6084#issuecomment-1011430150 https://api.github.com/repos/pydata/xarray/issues/6084 IC_kwDOAMm_X848STMG dougiesquire 42455466 2022-01-12T20:35:44Z 2022-01-12T20:35:44Z NONE

Thanks @shoyer. I understand the need for the schema, but is there a need to actually generate the dask graph when all the user wants to do is initialise an empty zarr store? E.g., I think skipping this line would save some of the users in my original post a lot of time.

Regardless, your suggestion to just create a low-overhead version of the array being initialised is probably better/cleaner than adding a specific option or method. Would it be worth adding the xarray.zeros_like(ds) recommendation to the docs?

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Initialise zarr metadata without computing dask graph 1083621690
1000628813 https://github.com/pydata/xarray/issues/6084#issuecomment-1000628813 https://api.github.com/repos/pydata/xarray/issues/6084 IC_kwDOAMm_X847pGJN dcherian 2448579 2021-12-24T03:17:44Z 2021-12-24T03:17:44Z MEMBER

What metadata is being determined by computing the whole array?

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Initialise zarr metadata without computing dask graph 1083621690
998357641 https://github.com/pydata/xarray/issues/6084#issuecomment-998357641 https://api.github.com/repos/pydata/xarray/issues/6084 IC_kwDOAMm_X847gbqJ shoyer 1217238 2021-12-21T00:00:49Z 2021-12-21T00:00:49Z MEMBER

The challenge is that Xarray needs some way to represent the "schema" for the desired entire dataset. I'm very open to alternatives, but so far, the most convenient way to do this has been to load Dask arrays into an xarray.Dataset.

It's worth noting that any dask arrays with the desired chunking scheme will do -- you don't need to use the same dask arrays that you want to compute. When I do this sort of thing, I will often use xarray.zeros_like() to create low overhead versions of dask arrays, e.g., in this example from Xarray-Beam: https://github.com/google/xarray-beam/blob/0.2.0/examples/era5_climatology.py#L61-L68

{
    "total_count": 2,
    "+1": 2,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Initialise zarr metadata without computing dask graph 1083621690

Advanced export

JSON shape: default, array, newline-delimited, object

CSV options:

CREATE TABLE [issue_comments] (
   [html_url] TEXT,
   [issue_url] TEXT,
   [id] INTEGER PRIMARY KEY,
   [node_id] TEXT,
   [user] INTEGER REFERENCES [users]([id]),
   [created_at] TEXT,
   [updated_at] TEXT,
   [author_association] TEXT,
   [body] TEXT,
   [reactions] TEXT,
   [performed_via_github_app] TEXT,
   [issue] INTEGER REFERENCES [issues]([id])
);
CREATE INDEX [idx_issue_comments_issue]
    ON [issue_comments] ([issue]);
CREATE INDEX [idx_issue_comments_user]
    ON [issue_comments] ([user]);
Powered by Datasette · Queries took 12.74ms · About: xarray-datasette
  • Sort ascending
  • Sort descending
  • Facet by this
  • Hide this column
  • Show all columns
  • Show not-blank rows