home / github

Menu
  • Search all tables
  • GraphQL API

issue_comments

Table actions
  • GraphQL API for issue_comments

1 row where issue = 327064908 and user = 6404167 sorted by updated_at descending

✎ View and edit SQL

This data as json, CSV (advanced)

Suggested facets: created_at (date), updated_at (date)

user 1

  • Karel-van-de-Plassche · 1 ✖

issue 1

  • Parallel non-locked read using dask.Client crashes · 1 ✖

author_association 1

  • CONTRIBUTOR 1
id html_url issue_url node_id user created_at updated_at ▲ author_association body reactions performed_via_github_app issue
392666250 https://github.com/pydata/xarray/issues/2190#issuecomment-392666250 https://api.github.com/repos/pydata/xarray/issues/2190 MDEyOklzc3VlQ29tbWVudDM5MjY2NjI1MA== Karel-van-de-Plassche 6404167 2018-05-29T06:27:52Z 2018-05-29T06:35:02Z CONTRIBUTOR

@shoyer Thanks for your answer. Too bad. Maybe this could be documented in the 'dask' chapter? Or maybe even raise a warning when using open_dataset with lock=False on a netCDF4 file?

Unfortunately there seems to be some conflicting information floating around, which is hard to spot for a non-expert like me. It might of course just be that xarray doesn't support it (yet). I think MPI-style opening is a whole different beast, right? For example:

  • python-netcdf4 support parallel read in threads: https://github.com/Unidata/netcdf4-python/issues/536
  • python-netcdf4 MPI parallel write/read: https://github.com/Unidata/netcdf4-python/blob/master/examples/mpi_example.py http://unidata.github.io/netcdf4-python/#section13
  • Using h5py directly (not supported by xarray I think): http://docs.h5py.org/en/latest/mpi.html
  • Seems to suggest multiple read is fine: https://github.com/dask/dask/issues/3074#issuecomment-359030028

You might have better luck using dask-distributed multiple processes, but then you'll encounter other bottlenecks with data transfer.

I'll do some more experiments, thanks for this suggestion. I am not bound to netCDF4 (although I need the compression, so no netCDF3 unfortunately), so would moving to Zarr help improving IO performance? I'd really like to keep using xarray, thanks for this awesome library! Even with the disk IO performance hit, it's still more than worth it to use it.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Parallel non-locked read using dask.Client crashes 327064908

Advanced export

JSON shape: default, array, newline-delimited, object

CSV options:

CREATE TABLE [issue_comments] (
   [html_url] TEXT,
   [issue_url] TEXT,
   [id] INTEGER PRIMARY KEY,
   [node_id] TEXT,
   [user] INTEGER REFERENCES [users]([id]),
   [created_at] TEXT,
   [updated_at] TEXT,
   [author_association] TEXT,
   [body] TEXT,
   [reactions] TEXT,
   [performed_via_github_app] TEXT,
   [issue] INTEGER REFERENCES [issues]([id])
);
CREATE INDEX [idx_issue_comments_issue]
    ON [issue_comments] ([issue]);
CREATE INDEX [idx_issue_comments_user]
    ON [issue_comments] ([user]);
Powered by Datasette · Queries took 15.7ms · About: xarray-datasette