issue_comments
5 rows where issue = 327064908 sorted by updated_at descending
This data as json, CSV (advanced)
Suggested facets: created_at (date), updated_at (date)
issue 1
- Parallel non-locked read using dask.Client crashes · 5 ✖
id | html_url | issue_url | node_id | user | created_at | updated_at ▲ | author_association | body | reactions | performed_via_github_app | issue |
---|---|---|---|---|---|---|---|---|---|---|---|
454162108 | https://github.com/pydata/xarray/issues/2190#issuecomment-454162108 | https://api.github.com/repos/pydata/xarray/issues/2190 | MDEyOklzc3VlQ29tbWVudDQ1NDE2MjEwOA== | max-sixty 5635139 | 2019-01-14T21:09:03Z | 2019-01-14T21:09:03Z | MEMBER | In an effort to reduce the issue backlog, I'll close this, but please reopen if you disagree |
{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
Parallel non-locked read using dask.Client crashes 327064908 | |
392672562 | https://github.com/pydata/xarray/issues/2190#issuecomment-392672562 | https://api.github.com/repos/pydata/xarray/issues/2190 | MDEyOklzc3VlQ29tbWVudDM5MjY3MjU2Mg== | shoyer 1217238 | 2018-05-29T06:59:32Z | 2018-05-29T06:59:32Z | MEMBER | Indeed, HDF5 supports parallel IO, but only with MPI. Unfortunately that didn't work with Dask, at least not yet. Zarr is certainly worth a try for performance. The motivation for zarr (rather than HDF5) was performance with distributed reads/writes, especially with cloud storage. On Mon, May 28, 2018 at 11:27 PM Karel van de Plassche notifications@github.com wrote:
|
{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
Parallel non-locked read using dask.Client crashes 327064908 | |
392666250 | https://github.com/pydata/xarray/issues/2190#issuecomment-392666250 | https://api.github.com/repos/pydata/xarray/issues/2190 | MDEyOklzc3VlQ29tbWVudDM5MjY2NjI1MA== | Karel-van-de-Plassche 6404167 | 2018-05-29T06:27:52Z | 2018-05-29T06:35:02Z | CONTRIBUTOR | @shoyer Thanks for your answer. Too bad. Maybe this could be documented in the 'dask' chapter? Or maybe even raise a warning when using open_dataset with Unfortunately there seems to be some conflicting information floating around, which is hard to spot for a non-expert like me. It might of course just be that xarray doesn't support it (yet). I think MPI-style opening is a whole different beast, right? For example:
I'll do some more experiments, thanks for this suggestion. I am not bound to netCDF4 (although I need the compression, so no netCDF3 unfortunately), so would moving to Zarr help improving IO performance? I'd really like to keep using xarray, thanks for this awesome library! Even with the disk IO performance hit, it's still more than worth it to use it. |
{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
Parallel non-locked read using dask.Client crashes 327064908 | |
392649160 | https://github.com/pydata/xarray/issues/2190#issuecomment-392649160 | https://api.github.com/repos/pydata/xarray/issues/2190 | MDEyOklzc3VlQ29tbWVudDM5MjY0OTE2MA== | shoyer 1217238 | 2018-05-29T04:24:58Z | 2018-05-29T04:24:58Z | MEMBER | Maybe there's some place we could document this more clearly?
|
{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
Parallel non-locked read using dask.Client crashes 327064908 | |
392647556 | https://github.com/pydata/xarray/issues/2190#issuecomment-392647556 | https://api.github.com/repos/pydata/xarray/issues/2190 | MDEyOklzc3VlQ29tbWVudDM5MjY0NzU1Ng== | shoyer 1217238 | 2018-05-29T04:11:55Z | 2018-05-29T04:11:55Z | MEMBER | Unfortunately HDF5 doesn't support reading or writing files (even different files) in parallel via the same process, which is why xarray by default adds a lock around all read/write operations from NetCDF4/HDF5 files. So I'm afraid this is expected behavior. You might have better luck using dask-distributed multiple processes, but then you'll encounter other bottlenecks with data transfer. |
{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
Parallel non-locked read using dask.Client crashes 327064908 |
Advanced export
JSON shape: default, array, newline-delimited, object
CREATE TABLE [issue_comments] ( [html_url] TEXT, [issue_url] TEXT, [id] INTEGER PRIMARY KEY, [node_id] TEXT, [user] INTEGER REFERENCES [users]([id]), [created_at] TEXT, [updated_at] TEXT, [author_association] TEXT, [body] TEXT, [reactions] TEXT, [performed_via_github_app] TEXT, [issue] INTEGER REFERENCES [issues]([id]) ); CREATE INDEX [idx_issue_comments_issue] ON [issue_comments] ([issue]); CREATE INDEX [idx_issue_comments_user] ON [issue_comments] ([user]);
user 3