issue_comments
5 rows where issue = 304201107 sorted by updated_at descending
This data as json, CSV (advanced)
Suggested facets: reactions, created_at (date), updated_at (date)
issue 1
- use dask to open datasets in parallel · 5 ✖
id | html_url | issue_url | node_id | user | created_at | updated_at ▲ | author_association | body | reactions | performed_via_github_app | issue |
---|---|---|---|---|---|---|---|---|---|---|---|
373806224 | https://github.com/pydata/xarray/issues/1981#issuecomment-373806224 | https://api.github.com/repos/pydata/xarray/issues/1981 | MDEyOklzc3VlQ29tbWVudDM3MzgwNjIyNA== | jmunroe 6181563 | 2018-03-16T18:34:19Z | 2018-03-16T18:34:19Z | CONTRIBUTOR | distributed |
{ "total_count": 2, "+1": 2, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
use dask to open datasets in parallel 304201107 | |
373802503 | https://github.com/pydata/xarray/issues/1981#issuecomment-373802503 | https://api.github.com/repos/pydata/xarray/issues/1981 | MDEyOklzc3VlQ29tbWVudDM3MzgwMjUwMw== | jhamman 2443309 | 2018-03-16T18:21:20Z | 2018-03-16T18:21:20Z | MEMBER | @jmunroe - this is good to know. Have you been using the default scheduler (multiprocessing for dask.bag) or the distributed scheduler? |
{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
use dask to open datasets in parallel 304201107 | |
373794415 | https://github.com/pydata/xarray/issues/1981#issuecomment-373794415 | https://api.github.com/repos/pydata/xarray/issues/1981 | MDEyOklzc3VlQ29tbWVudDM3Mzc5NDQxNQ== | jmunroe 6181563 | 2018-03-16T17:53:44Z | 2018-03-16T17:53:44Z | CONTRIBUTOR | For what's worth, this is exactly the workflow I use (https://github.com/OceansAus/cosima-cookbook) when opening a large number of netCDF files:
and then
and it appears to work well. Code snippets from cosima-cookbook/cosima_cookbook/netcdf_index.py |
{ "total_count": 3, "+1": 2, "-1": 0, "laugh": 0, "hooray": 1, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
use dask to open datasets in parallel 304201107 | |
372316094 | https://github.com/pydata/xarray/issues/1981#issuecomment-372316094 | https://api.github.com/repos/pydata/xarray/issues/1981 | MDEyOklzc3VlQ29tbWVudDM3MjMxNjA5NA== | jhamman 2443309 | 2018-03-12T13:51:07Z | 2018-03-12T13:51:07Z | MEMBER | @shoyer - we can sidestep the global HDF lock if we use multiprocessing (or the distributed scheduler as you mentioned) and the |
{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
use dask to open datasets in parallel 304201107 | |
372195137 | https://github.com/pydata/xarray/issues/1981#issuecomment-372195137 | https://api.github.com/repos/pydata/xarray/issues/1981 | MDEyOklzc3VlQ29tbWVudDM3MjE5NTEzNw== | shoyer 1217238 | 2018-03-12T05:09:16Z | 2018-03-12T05:09:16Z | MEMBER | I think is definitely worth exploring and could potentially be a large win. One potential challenge is global locking with HDF5. If opening many datasets is slow because much data needs to get read with HDF5, then multiple threads will not help -- you'll need to use multiple processes, e.g., with dask-distributed. |
{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
use dask to open datasets in parallel 304201107 |
Advanced export
JSON shape: default, array, newline-delimited, object
CREATE TABLE [issue_comments] ( [html_url] TEXT, [issue_url] TEXT, [id] INTEGER PRIMARY KEY, [node_id] TEXT, [user] INTEGER REFERENCES [users]([id]), [created_at] TEXT, [updated_at] TEXT, [author_association] TEXT, [body] TEXT, [reactions] TEXT, [performed_via_github_app] TEXT, [issue] INTEGER REFERENCES [issues]([id]) ); CREATE INDEX [idx_issue_comments_issue] ON [issue_comments] ([issue]); CREATE INDEX [idx_issue_comments_user] ON [issue_comments] ([user]);
user 3