home / github

Menu
  • GraphQL API
  • Search all tables

issue_comments

Table actions
  • GraphQL API for issue_comments

4 rows where author_association = "MEMBER" and issue = 504497403 sorted by updated_at descending

✎ View and edit SQL

This data as json, CSV (advanced)

Suggested facets: created_at (date), updated_at (date)

user 3

  • crusaderky 2
  • shoyer 1
  • dcherian 1

issue 1

  • add option to open_mfdataset for not using dask · 4 ✖

author_association 1

  • MEMBER · 4 ✖
id html_url issue_url node_id user created_at updated_at ▲ author_association body reactions performed_via_github_app issue
540474492 https://github.com/pydata/xarray/issues/3386#issuecomment-540474492 https://api.github.com/repos/pydata/xarray/issues/3386 MDEyOklzc3VlQ29tbWVudDU0MDQ3NDQ5Mg== crusaderky 6213168 2019-10-10T09:05:21Z 2019-10-10T09:05:21Z MEMBER

@sipposip if your dask graph is resolved straight after the load from disk, you can try disabling the dask optimizer to see if you can squeeze some milliseconds out of load(). You can look up the setting syntax on the dask documentation.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  add option to open_mfdataset for not using dask 504497403
540208420 https://github.com/pydata/xarray/issues/3386#issuecomment-540208420 https://api.github.com/repos/pydata/xarray/issues/3386 MDEyOklzc3VlQ29tbWVudDU0MDIwODQyMA== shoyer 1217238 2019-10-09T21:28:48Z 2019-10-09T21:28:48Z MEMBER

netCDF4.MFDataset works on a much more restricted set of netCDF files than xarray.open_mfdataset. I'm not surprised it's a little bit faster, but I'm not sure it's worth the maintenance burden of supporting this separate code path. Making a fully featured version of open_mfdataset with dask would be challenging.

Can you simply add more threads in TensorFlow/Keras for loading the data? My other suggestion is to pre-shuffle the data on disk, so you don't need random access inside your training loop.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  add option to open_mfdataset for not using dask 504497403
540033550 https://github.com/pydata/xarray/issues/3386#issuecomment-540033550 https://api.github.com/repos/pydata/xarray/issues/3386 MDEyOklzc3VlQ29tbWVudDU0MDAzMzU1MA== dcherian 2448579 2019-10-09T14:43:29Z 2019-10-09T14:43:29Z MEMBER

It would be useful to see what a single file looks like and what the combined dataset looks like. open_mfdataset can sometimes require some tuning to get good performance.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  add option to open_mfdataset for not using dask 504497403
539907822 https://github.com/pydata/xarray/issues/3386#issuecomment-539907822 https://api.github.com/repos/pydata/xarray/issues/3386 MDEyOklzc3VlQ29tbWVudDUzOTkwNzgyMg== crusaderky 6213168 2019-10-09T08:58:21Z 2019-10-09T08:58:21Z MEMBER

@sipposip xarray doesn't use netCDF4.MFDataset, but netCDF4.Dataset which is then wrapped by dask arrays which are then concatenated.

Opening each file separately with open_dataset, and then concatenating them with xr.concat does not work, as this loads the data into memory.

This is by design, because of the reason above. The NetCDF/HDF5 lazy loading means that data is loaded up into a numpy.ndarray on the first operation performed upon it. This includes concatenation.

I'm aware that threads within threads, threads within processes, and processes within threads cause a world of pain in the form of random deadlocks - I've been there myself. You can completely disable dask threads process-wide with python dask.config.set(scheduler="synchronous") ... ds.load() or as a context manager python with dask.config.set(scheduler="synchronous"): ds.load() or for the single operation: python ds.load(scheduler="synchronous") Does this address your issue?

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  add option to open_mfdataset for not using dask 504497403

Advanced export

JSON shape: default, array, newline-delimited, object

CSV options:

CREATE TABLE [issue_comments] (
   [html_url] TEXT,
   [issue_url] TEXT,
   [id] INTEGER PRIMARY KEY,
   [node_id] TEXT,
   [user] INTEGER REFERENCES [users]([id]),
   [created_at] TEXT,
   [updated_at] TEXT,
   [author_association] TEXT,
   [body] TEXT,
   [reactions] TEXT,
   [performed_via_github_app] TEXT,
   [issue] INTEGER REFERENCES [issues]([id])
);
CREATE INDEX [idx_issue_comments_issue]
    ON [issue_comments] ([issue]);
CREATE INDEX [idx_issue_comments_user]
    ON [issue_comments] ([user]);
Powered by Datasette · Queries took 14.519ms · About: xarray-datasette