github: issue_comments: 2 rows where author_association = "MEMBER", issue = 344621749 and user = 1217238 sorted by updated

2 rows where author_association = "MEMBER", issue = 344621749 and user = 1217238 sorted by updated_at descending

Search:

descending

id	html_url	issue_url	node_id	user	created_at	updated_at ▲	author_association	body	reactions	performed_via_github_app	issue
417413527	https://github.com/pydata/xarray/issues/2314#issuecomment-417413527	https://api.github.com/repos/pydata/xarray/issues/2314	MDEyOklzc3VlQ29tbWVudDQxNzQxMzUyNw==	shoyer 1217238	2018-08-30T18:04:29Z	2018-08-30T18:04:29Z	MEMBER	I see now that you are using dask-distributed, but I guess there are still too many intermediate outputs here to do a single rechunk operation. The crude but effective way to solve this problem would be to loop over spatial tiles using an indexing operation to pull out only a limited extent, compute the calculation on each tile and then reassemble the tiles at the end. To see if this will work, you might try computing a single time-series on your merged dataset before calling `.chunk()`, e.g., `merged.isel(x=0, y=0).compute()`. In theory, I think using `chunks` in `open_rasterio` should achieve exactly what you want here (assuming the geotiffs are tiled), but as you note it makes for a giant task graph. To balance this tradeoff, I might try picking a very large initial chunksize, e.g., `xr.open_rasterio(x, chunks={'x': 3500, 'y': 3500})`. This would effectively split the "rechunk" operation into 9 entirely independent parts.	{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }		Chunked processing across multiple raster (geoTIF) files 344621749
417404832	https://github.com/pydata/xarray/issues/2314#issuecomment-417404832	https://api.github.com/repos/pydata/xarray/issues/2314	MDEyOklzc3VlQ29tbWVudDQxNzQwNDgzMg==	shoyer 1217238	2018-08-30T17:38:40Z	2018-08-30T17:42:00Z	MEMBER	I think the explicit `chunk()` call is the source of your woes here. That creates a bunch of tasks to reshard your data that require loading the entire array into memory. If you're using dask-distributed, I think the large intermediate outputs would get cached to disk but this fails if you're using the simpler multithreaded scheduler. ~~If you drop the line that calls `.chunk()` and manually index your array to pull out a single time-series before calling `map_blocks`, does that work properly? e.g., something like `merged.isel(x=0, y=0).data.map_blocks(myfunction)`~~ (nevermind, this is probably not a great idea)	{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }		Chunked processing across multiple raster (geoTIF) files 344621749

Advanced export

JSON shape: default, array, newline-delimited, object

CREATE TABLE [issue_comments] (
   [html_url] TEXT,
   [issue_url] TEXT,
   [id] INTEGER PRIMARY KEY,
   [node_id] TEXT,
   [user] INTEGER REFERENCES [users]([id]),
   [created_at] TEXT,
   [updated_at] TEXT,
   [author_association] TEXT,
   [body] TEXT,
   [reactions] TEXT,
   [performed_via_github_app] TEXT,
   [issue] INTEGER REFERENCES [issues]([id])
);
CREATE INDEX [idx_issue_comments_issue]
    ON [issue_comments] ([issue]);
CREATE INDEX [idx_issue_comments_user]
    ON [issue_comments] ([user]);