home / github

Menu
  • Search all tables
  • GraphQL API

issue_comments

Table actions
  • GraphQL API for issue_comments

4 rows where author_association = "NONE" and issue = 712189206 sorted by updated_at descending

✎ View and edit SQL

This data as json, CSV (advanced)

Suggested facets: created_at (date), updated_at (date)

user 1

  • heerad 4

issue 1

  • Preprocess function for save_mfdataset · 4 ✖

author_association 1

  • NONE · 4 ✖
id html_url issue_url node_id user created_at updated_at ▲ author_association body reactions performed_via_github_app issue
702307334 https://github.com/pydata/xarray/issues/4475#issuecomment-702307334 https://api.github.com/repos/pydata/xarray/issues/4475 MDEyOklzc3VlQ29tbWVudDcwMjMwNzMzNA== heerad 2560426 2020-10-01T18:07:55Z 2020-10-01T18:07:55Z NONE

Sounds good, I'll do this in the meantime. Still quite interested in save_mfdataset dealing with these lower level details, if possible. The ideal case would be loading with load_mfdataset, defining some ops lazily, then piping that directly to save_mfdataset.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Preprocess function for save_mfdataset 712189206
702265883 https://github.com/pydata/xarray/issues/4475#issuecomment-702265883 https://api.github.com/repos/pydata/xarray/issues/4475 MDEyOklzc3VlQ29tbWVudDcwMjI2NTg4Mw== heerad 2560426 2020-10-01T16:52:59Z 2020-10-01T16:52:59Z NONE

Multiple threads (the default), because it's recommended "for numeric code that releases the GIL (like NumPy, Pandas, Scikit-Learn, Numba, …)" according to the dask docs.

I guess I could do multi-threaded for the compute part (everything up to the definition of ds), then multi-process for the write part, but doesn't that then require me to load everything into memory before writing?

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Preprocess function for save_mfdataset 712189206
702178407 https://github.com/pydata/xarray/issues/4475#issuecomment-702178407 https://api.github.com/repos/pydata/xarray/issues/4475 MDEyOklzc3VlQ29tbWVudDcwMjE3ODQwNw== heerad 2560426 2020-10-01T14:34:28Z 2020-10-01T14:34:28Z NONE

Thank you, this works for me. However, it's quite slow and seems to scale faster than linearly as the length of datasets increases (the number of groups in the groupby).

Could it be connected to https://github.com/pydata/xarray/issues/2912#issuecomment-485497398 where they suggest to use save_mfdataset instead of to_netcdf? If so, there's a stronger case for supporting delayed objects in save_mfdataset as you said.

Appreciate the help!

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Preprocess function for save_mfdataset 712189206
701676076 https://github.com/pydata/xarray/issues/4475#issuecomment-701676076 https://api.github.com/repos/pydata/xarray/issues/4475 MDEyOklzc3VlQ29tbWVudDcwMTY3NjA3Ng== heerad 2560426 2020-09-30T22:17:24Z 2020-09-30T22:17:24Z NONE

Unfortunately that doesn't work:

TypeError: save_mfdataset only supports writing Dataset objects, received type <class 'dask.delayed.Delayed'>

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Preprocess function for save_mfdataset 712189206

Advanced export

JSON shape: default, array, newline-delimited, object

CSV options:

CREATE TABLE [issue_comments] (
   [html_url] TEXT,
   [issue_url] TEXT,
   [id] INTEGER PRIMARY KEY,
   [node_id] TEXT,
   [user] INTEGER REFERENCES [users]([id]),
   [created_at] TEXT,
   [updated_at] TEXT,
   [author_association] TEXT,
   [body] TEXT,
   [reactions] TEXT,
   [performed_via_github_app] TEXT,
   [issue] INTEGER REFERENCES [issues]([id])
);
CREATE INDEX [idx_issue_comments_issue]
    ON [issue_comments] ([issue]);
CREATE INDEX [idx_issue_comments_user]
    ON [issue_comments] ([user]);
Powered by Datasette · Queries took 12.87ms · About: xarray-datasette