github: issue_comments: 3 rows where issue = 214088387 and user = 7300413 sorted by updated

3 rows where issue = 214088387 and user = 7300413 sorted by updated_at descending

Search:

descending

id	html_url	issue_url	node_id	user	created_at	updated_at ▲	author_association	body	reactions	issue
286779750	https://github.com/pydata/xarray/issues/1308#issuecomment-286779750	https://api.github.com/repos/pydata/xarray/issues/1308	MDEyOklzc3VlQ29tbWVudDI4Njc3OTc1MA==	JoyMonteiro 7300413	2017-03-15T15:32:33Z	2017-03-15T15:32:33Z	NONE	Not sure if this helps, but I did a `%%timeit` on both versions. For daily climatology, the numbers are: CPU times: user 1h 21min 8s, sys: 6h 17min 39s, total: 7h 38min 47s Wall time: 20min 34s For the 6 hourly thing, CPU times: user 5h 5min 6s, sys: 1d 2h 19min 45s, total: 1d 7h 24min 51s Wall time: 1h 31min 40s It takes around 4x more time, which makes sense because there are 4x more groups. The ratio of user to system time is more or less constant, so nothing untoward seems to be happening in between the two runs. I think it is just good to remember that the time to use scales linearly with the number of groups. I guess this is what @shoyer was talking about when he mentioned that since grouping is done within xarray, the dask graph grows, making things slower. Thanks again!	{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	Using groupby with custom index 214088387
286509639	https://github.com/pydata/xarray/issues/1308#issuecomment-286509639	https://api.github.com/repos/pydata/xarray/issues/1308	MDEyOklzc3VlQ29tbWVudDI4NjUwOTYzOQ==	JoyMonteiro 7300413	2017-03-14T18:05:54Z	2017-03-14T18:05:54Z	NONE	@shoyer If I increase the size of the longitude chunk anymore, it will almost like using no chunking at all. I guess this dataset is a corner case. I will try increasing doubling that value and see what happens. I hadn't realised that doing a groupby would also reduce the effective chunk size, thanks for pointing that out. I'm using dask without distributed as of now, is there still some way to do the benchmark? I would be more than happy to run it. @rabernat I would definitely favour a cloud based sandbox to try these things out. What would be the stumbling block towards actually setting it up? I have had some recent experience setting up jupyterhub, I can help set that up so that notebooks can be used easily in such an environment.	{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	Using groupby with custom index 214088387
286497255	https://github.com/pydata/xarray/issues/1308#issuecomment-286497255	https://api.github.com/repos/pydata/xarray/issues/1308	MDEyOklzc3VlQ29tbWVudDI4NjQ5NzI1NQ==	JoyMonteiro 7300413	2017-03-14T17:27:06Z	2017-03-14T17:31:32Z	NONE	Hello Stephan, The shape of the full data, if I read from within xarray, is (time, level, lat, lon), with level=60, lat=41, lon=480. time is `43657 ~ 10000`. I am chunking only along longitude, using lon=100. I previously chunked along time, but that used too much memory (~45GB out of 128 GB) since the data is split into one file per month, and reading annual data would require reading many files into memory. Superficially, I would think that both of the above would take similar amounts of time. In fact, calculating a daily climatology also requires grouping the four 6 hourly data points into a single day as well, which seems to be more complicated. However, it seems to run faster! Thanks, Joy	{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	Using groupby with custom index 214088387

Advanced export

JSON shape: default, array, newline-delimited, object

CREATE TABLE [issue_comments] (
   [html_url] TEXT,
   [issue_url] TEXT,
   [id] INTEGER PRIMARY KEY,
   [node_id] TEXT,
   [user] INTEGER REFERENCES [users]([id]),
   [created_at] TEXT,
   [updated_at] TEXT,
   [author_association] TEXT,
   [body] TEXT,
   [reactions] TEXT,
   [performed_via_github_app] TEXT,
   [issue] INTEGER REFERENCES [issues]([id])
);
CREATE INDEX [idx_issue_comments_issue]
    ON [issue_comments] ([issue]);
CREATE INDEX [idx_issue_comments_user]
    ON [issue_comments] ([user]);