home / github

Menu
  • Search all tables
  • GraphQL API

issue_comments

Table actions
  • GraphQL API for issue_comments

2 rows where author_association = "MEMBER", issue = 504497403 and user = 6213168 sorted by updated_at descending

✎ View and edit SQL

This data as json, CSV (advanced)

Suggested facets: created_at (date)

user 1

  • crusaderky · 2 ✖

issue 1

  • add option to open_mfdataset for not using dask · 2 ✖

author_association 1

  • MEMBER · 2 ✖
id html_url issue_url node_id user created_at updated_at ▲ author_association body reactions performed_via_github_app issue
540474492 https://github.com/pydata/xarray/issues/3386#issuecomment-540474492 https://api.github.com/repos/pydata/xarray/issues/3386 MDEyOklzc3VlQ29tbWVudDU0MDQ3NDQ5Mg== crusaderky 6213168 2019-10-10T09:05:21Z 2019-10-10T09:05:21Z MEMBER

@sipposip if your dask graph is resolved straight after the load from disk, you can try disabling the dask optimizer to see if you can squeeze some milliseconds out of load(). You can look up the setting syntax on the dask documentation.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  add option to open_mfdataset for not using dask 504497403
539907822 https://github.com/pydata/xarray/issues/3386#issuecomment-539907822 https://api.github.com/repos/pydata/xarray/issues/3386 MDEyOklzc3VlQ29tbWVudDUzOTkwNzgyMg== crusaderky 6213168 2019-10-09T08:58:21Z 2019-10-09T08:58:21Z MEMBER

@sipposip xarray doesn't use netCDF4.MFDataset, but netCDF4.Dataset which is then wrapped by dask arrays which are then concatenated.

Opening each file separately with open_dataset, and then concatenating them with xr.concat does not work, as this loads the data into memory.

This is by design, because of the reason above. The NetCDF/HDF5 lazy loading means that data is loaded up into a numpy.ndarray on the first operation performed upon it. This includes concatenation.

I'm aware that threads within threads, threads within processes, and processes within threads cause a world of pain in the form of random deadlocks - I've been there myself. You can completely disable dask threads process-wide with python dask.config.set(scheduler="synchronous") ... ds.load() or as a context manager python with dask.config.set(scheduler="synchronous"): ds.load() or for the single operation: python ds.load(scheduler="synchronous") Does this address your issue?

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  add option to open_mfdataset for not using dask 504497403

Advanced export

JSON shape: default, array, newline-delimited, object

CSV options:

CREATE TABLE [issue_comments] (
   [html_url] TEXT,
   [issue_url] TEXT,
   [id] INTEGER PRIMARY KEY,
   [node_id] TEXT,
   [user] INTEGER REFERENCES [users]([id]),
   [created_at] TEXT,
   [updated_at] TEXT,
   [author_association] TEXT,
   [body] TEXT,
   [reactions] TEXT,
   [performed_via_github_app] TEXT,
   [issue] INTEGER REFERENCES [issues]([id])
);
CREATE INDEX [idx_issue_comments_issue]
    ON [issue_comments] ([issue]);
CREATE INDEX [idx_issue_comments_user]
    ON [issue_comments] ([user]);
Powered by Datasette · Queries took 4719.136ms · About: xarray-datasette