home / github

Menu
  • GraphQL API
  • Search all tables

issue_comments

Table actions
  • GraphQL API for issue_comments

4 rows where issue = 712189206 and user = 2448579 sorted by updated_at descending

✎ View and edit SQL

This data as json, CSV (advanced)

Suggested facets: reactions, created_at (date)

user 1

  • dcherian · 4 ✖

issue 1

  • Preprocess function for save_mfdataset · 4 ✖

author_association 1

  • MEMBER 4
id html_url issue_url node_id user created_at updated_at ▲ author_association body reactions performed_via_github_app issue
702276824 https://github.com/pydata/xarray/issues/4475#issuecomment-702276824 https://api.github.com/repos/pydata/xarray/issues/4475 MDEyOklzc3VlQ29tbWVudDcwMjI3NjgyNA== dcherian 2448579 2020-10-01T17:13:16Z 2020-10-01T17:13:16Z MEMBER

doesn't that then require me to load everything into memory before writing?

I think so.

I would try multiple processes and see if that is fast enough for what you want to do. Or else, write to zarr. This will be parallelized and is a lot easier than dealing with HDF5

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Preprocess function for save_mfdataset 712189206
702226256 https://github.com/pydata/xarray/issues/4475#issuecomment-702226256 https://api.github.com/repos/pydata/xarray/issues/4475 MDEyOklzc3VlQ29tbWVudDcwMjIyNjI1Ng== dcherian 2448579 2020-10-01T15:46:45Z 2020-10-01T15:46:45Z MEMBER

Are you using multiple threads or multiple processes? IIUC you should be using multiple processes for max writing efficiency.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Preprocess function for save_mfdataset 712189206
701688956 https://github.com/pydata/xarray/issues/4475#issuecomment-701688956 https://api.github.com/repos/pydata/xarray/issues/4475 MDEyOklzc3VlQ29tbWVudDcwMTY4ODk1Ng== dcherian 2448579 2020-09-30T22:55:28Z 2020-09-30T22:55:28Z MEMBER

You could write to netCDF in your_function and avoid save_mfdataset altogether...

I guess this is a good argument for adding a preprocess kwarg.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Preprocess function for save_mfdataset 712189206
701577652 https://github.com/pydata/xarray/issues/4475#issuecomment-701577652 https://api.github.com/repos/pydata/xarray/issues/4475 MDEyOklzc3VlQ29tbWVudDcwMTU3NzY1Mg== dcherian 2448579 2020-09-30T18:51:25Z 2020-09-30T18:51:25Z MEMBER

you could use dask.delayed here

new_datasets = [dask.delayed(your_function)(dset) for dset in datasets] xr.save_mfdataset(new_datasets, paths)

I think this will work, but I've never used save_mfdataset. This is how preprocess is implemented with open_mfdataset btw.

{
    "total_count": 1,
    "+1": 1,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Preprocess function for save_mfdataset 712189206

Advanced export

JSON shape: default, array, newline-delimited, object

CSV options:

CREATE TABLE [issue_comments] (
   [html_url] TEXT,
   [issue_url] TEXT,
   [id] INTEGER PRIMARY KEY,
   [node_id] TEXT,
   [user] INTEGER REFERENCES [users]([id]),
   [created_at] TEXT,
   [updated_at] TEXT,
   [author_association] TEXT,
   [body] TEXT,
   [reactions] TEXT,
   [performed_via_github_app] TEXT,
   [issue] INTEGER REFERENCES [issues]([id])
);
CREATE INDEX [idx_issue_comments_issue]
    ON [issue_comments] ([issue]);
CREATE INDEX [idx_issue_comments_user]
    ON [issue_comments] ([user]);
Powered by Datasette · Queries took 1393.875ms · About: xarray-datasette