issue_comments
3 rows where issue = 197939448 sorted by updated_at descending
This data as json, CSV (advanced)
Suggested facets: created_at (date), updated_at (date)
issue 1
- Document using a spawning multiprocessing pool for multiprocessing with dask · 3 ✖
id | html_url | issue_url | node_id | user | created_at | updated_at ▲ | author_association | body | reactions | performed_via_github_app | issue |
---|---|---|---|---|---|---|---|---|---|---|---|
269573421 | https://github.com/pydata/xarray/issues/1189#issuecomment-269573421 | https://api.github.com/repos/pydata/xarray/issues/1189 | MDEyOklzc3VlQ29tbWVudDI2OTU3MzQyMQ== | mrocklin 306380 | 2016-12-29T02:36:08Z | 2016-12-29T02:36:08Z | MEMBER | Dask.distributed now creates a forkserver at startup. This seems to be working well so far. It nicely balances having a well defined environment and fast startup time. How much inter-worker data transfer would you expect? It might be worth running through a few classic algorithms with it instead of the threaded scheduler and looking at performance changes. The diagnostic pages would be a nice bonus here and might help to highlight some performance issues. If anyone is interested in this the thing to do is
And then operate as normal. |
{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
Document using a spawning multiprocessing pool for multiprocessing with dask 197939448 | |
269573022 | https://github.com/pydata/xarray/issues/1189#issuecomment-269573022 | https://api.github.com/repos/pydata/xarray/issues/1189 | MDEyOklzc3VlQ29tbWVudDI2OTU3MzAyMg== | shoyer 1217238 | 2016-12-29T02:30:16Z | 2016-12-29T02:30:16Z | MEMBER | Actually, I just tested it and it appears that forking also works, as long as you create the pool before opening any files. Otherwise, the netCDF library crashes (https://github.com/pydata/xarray/pull/1128#issuecomment-261841025). A local "distributed" scheduler might indeed also work, but at least when operating on a single machine it makes sense to bring all data into a single process once it's been loaded for multi-threaded data analysis. |
{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
Document using a spawning multiprocessing pool for multiprocessing with dask 197939448 | |
269572088 | https://github.com/pydata/xarray/issues/1189#issuecomment-269572088 | https://api.github.com/repos/pydata/xarray/issues/1189 | MDEyOklzc3VlQ29tbWVudDI2OTU3MjA4OA== | mrocklin 306380 | 2016-12-29T02:17:40Z | 2016-12-29T02:17:40Z | MEMBER | Can you remind me the motivation to use a spawning multiprocessing pool instead of a fork or forkserver solution? For mixed multi-threading/multi-processing would a local "distributed" scheduler suffice? This would be several single-threaded processes on a single machine. The scheduler would be aware of data locality and avoid inter-node communication when possible. |
{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
Document using a spawning multiprocessing pool for multiprocessing with dask 197939448 |
Advanced export
JSON shape: default, array, newline-delimited, object
CREATE TABLE [issue_comments] ( [html_url] TEXT, [issue_url] TEXT, [id] INTEGER PRIMARY KEY, [node_id] TEXT, [user] INTEGER REFERENCES [users]([id]), [created_at] TEXT, [updated_at] TEXT, [author_association] TEXT, [body] TEXT, [reactions] TEXT, [performed_via_github_app] TEXT, [issue] INTEGER REFERENCES [issues]([id]) ); CREATE INDEX [idx_issue_comments_issue] ON [issue_comments] ([issue]); CREATE INDEX [idx_issue_comments_user] ON [issue_comments] ([user]);
user 2