github: issue_comments: 5 rows where author_association = "MEMBER" and issue = 214088387 sorted by updated

5 rows where author_association = "MEMBER" and issue = 214088387 sorted by updated_at descending

Search:

✖

descending

id	html_url	issue_url	node_id	user	created_at	updated_at ▲	author_association	body	reactions	issue
286516988	https://github.com/pydata/xarray/issues/1308#issuecomment-286516988	https://api.github.com/repos/pydata/xarray/issues/1308	MDEyOklzc3VlQ29tbWVudDI4NjUxNjk4OA==	shoyer 1217238	2017-03-14T18:29:55Z	2017-03-14T18:29:55Z	MEMBER	I wonder if the fact that the data is highly compressed (short types converted to float64 with the scaled and offset attributes) can have an influence on dask performance and memory consumption? (especially the later) Memory consumption, yes, performance, not so much. Scale/offset (de)compression can be applied super fast, unlike zlib compression which can be 10x slower than reading from disk.	{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	Using groupby with custom index 214088387
286511848	https://github.com/pydata/xarray/issues/1308#issuecomment-286511848	https://api.github.com/repos/pydata/xarray/issues/1308	MDEyOklzc3VlQ29tbWVudDI4NjUxMTg0OA==	fmaussion 10050469	2017-03-14T18:13:18Z	2017-03-14T18:13:18Z	MEMBER	I've had some troubles with 6-Hrly ERA-Interim data myself recently. I wonder if the fact that the data is highly compressed (`short` types converted to `float64` with the `scaled` and `offset` attributes) can have an influence on dask performance and memory consumption? (especially the later)	{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	Using groupby with custom index 214088387
286502400	https://github.com/pydata/xarray/issues/1308#issuecomment-286502400	https://api.github.com/repos/pydata/xarray/issues/1308	MDEyOklzc3VlQ29tbWVudDI4NjUwMjQwMA==	shoyer 1217238	2017-03-14T17:43:13Z	2017-03-14T17:43:13Z	MEMBER	We currently do all the groupby handling ourselves, which means that when you group over smaller units the dask graph gets bigger and each of the tasks gets smaller. Given that each chunk in the grouped data is only about ~250,000 elements, it's not surprising that things get a bit slower -- that's near the point where Python overhead starts to get significant. It would be useful to benchmark graph creation and execution separately (especially using dask-distributed's profiling tools) to understand where the slow-down is. One thing that might help quite a bit in cases like this where the individual groups are small is to rewrite xarray's groupby to do some groupby operations inside dask, rather than in a loop outside of dask. That would allow executing tasks on bigger chunks of arrays at once, which could significantly reduce scheduler overhead.	{ "total_count": 2, "+1": 2, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	Using groupby with custom index 214088387
286499366	https://github.com/pydata/xarray/issues/1308#issuecomment-286499366	https://api.github.com/repos/pydata/xarray/issues/1308	MDEyOklzc3VlQ29tbWVudDI4NjQ5OTM2Ng==	rabernat 1197350	2017-03-14T17:33:36Z	2017-03-14T17:33:36Z	MEMBER	Slightly OT observation: Performance issues are increasingly being raised here (see also #1301). Wouldn't it be great if we had shared space somewhere in the cloud to host these big-ish datasets and run performance benchmarks in a controlled environment?	{ "total_count": 2, "+1": 2, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	Using groupby with custom index 214088387
286482853	https://github.com/pydata/xarray/issues/1308#issuecomment-286482853	https://api.github.com/repos/pydata/xarray/issues/1308	MDEyOklzc3VlQ29tbWVudDI4NjQ4Mjg1Mw==	shoyer 1217238	2017-03-14T16:43:27Z	2017-03-14T16:43:27Z	MEMBER	Can you share the shape and dask chunking for `data`, and also describe how the data is stored? That can make a big difference for performance.	{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	Using groupby with custom index 214088387

Advanced export

JSON shape: default, array, newline-delimited, object

CREATE TABLE [issue_comments] (
   [html_url] TEXT,
   [issue_url] TEXT,
   [id] INTEGER PRIMARY KEY,
   [node_id] TEXT,
   [user] INTEGER REFERENCES [users]([id]),
   [created_at] TEXT,
   [updated_at] TEXT,
   [author_association] TEXT,
   [body] TEXT,
   [reactions] TEXT,
   [performed_via_github_app] TEXT,
   [issue] INTEGER REFERENCES [issues]([id])
);
CREATE INDEX [idx_issue_comments_issue]
    ON [issue_comments] ([issue]);
CREATE INDEX [idx_issue_comments_user]
    ON [issue_comments] ([user]);