github: issue_comments: 3 rows where issue = 197939448 sorted by updated

3 rows where issue = 197939448 sorted by updated_at descending

Search:

descending

id	html_url	issue_url	node_id	user	created_at	updated_at ▲	author_association	body	reactions	issue
269573421	https://github.com/pydata/xarray/issues/1189#issuecomment-269573421	https://api.github.com/repos/pydata/xarray/issues/1189	MDEyOklzc3VlQ29tbWVudDI2OTU3MzQyMQ==	mrocklin 306380	2016-12-29T02:36:08Z	2016-12-29T02:36:08Z	MEMBER	Dask.distributed now creates a forkserver at startup. This seems to be working well so far. It nicely balances having a well defined environment and fast startup time. How much inter-worker data transfer would you expect? It might be worth running through a few classic algorithms with it instead of the threaded scheduler and looking at performance changes. The diagnostic pages would be a nice bonus here and might help to highlight some performance issues. If anyone is interested in this the thing to do is `$ conda install -c conda-forge dask distributed >>> from dask.distributed import Client >>> c = Client() # sets global scheduler by default` And then operate as normal.	{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	Document using a spawning multiprocessing pool for multiprocessing with dask 197939448
269573022	https://github.com/pydata/xarray/issues/1189#issuecomment-269573022	https://api.github.com/repos/pydata/xarray/issues/1189	MDEyOklzc3VlQ29tbWVudDI2OTU3MzAyMg==	shoyer 1217238	2016-12-29T02:30:16Z	2016-12-29T02:30:16Z	MEMBER	Actually, I just tested it and it appears that forking also works, as long as you create the pool before opening any files. Otherwise, the netCDF library crashes (https://github.com/pydata/xarray/pull/1128#issuecomment-261841025). A local "distributed" scheduler might indeed also work, but at least when operating on a single machine it makes sense to bring all data into a single process once it's been loaded for multi-threaded data analysis.	{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	Document using a spawning multiprocessing pool for multiprocessing with dask 197939448
269572088	https://github.com/pydata/xarray/issues/1189#issuecomment-269572088	https://api.github.com/repos/pydata/xarray/issues/1189	MDEyOklzc3VlQ29tbWVudDI2OTU3MjA4OA==	mrocklin 306380	2016-12-29T02:17:40Z	2016-12-29T02:17:40Z	MEMBER	Can you remind me the motivation to use a spawning multiprocessing pool instead of a fork or forkserver solution? For mixed multi-threading/multi-processing would a local "distributed" scheduler suffice? This would be several single-threaded processes on a single machine. The scheduler would be aware of data locality and avoid inter-node communication when possible.	{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	Document using a spawning multiprocessing pool for multiprocessing with dask 197939448

Advanced export

JSON shape: default, array, newline-delimited, object

CREATE TABLE [issue_comments] (
   [html_url] TEXT,
   [issue_url] TEXT,
   [id] INTEGER PRIMARY KEY,
   [node_id] TEXT,
   [user] INTEGER REFERENCES [users]([id]),
   [created_at] TEXT,
   [updated_at] TEXT,
   [author_association] TEXT,
   [body] TEXT,
   [reactions] TEXT,
   [performed_via_github_app] TEXT,
   [issue] INTEGER REFERENCES [issues]([id])
);
CREATE INDEX [idx_issue_comments_issue]
    ON [issue_comments] ([issue]);
CREATE INDEX [idx_issue_comments_user]
    ON [issue_comments] ([user]);