github: issue_comments: 5 rows where author_association = "MEMBER" and issue = 344621749 sorted by updated

5 rows where author_association = "MEMBER" and issue = 344621749 sorted by updated_at descending

Search:

descending

id	html_url	issue_url	node_id	user	created_at	updated_at ▲	author_association	body	reactions	issue
1488891109	https://github.com/pydata/xarray/issues/2314#issuecomment-1488891109	https://api.github.com/repos/pydata/xarray/issues/2314	IC_kwDOAMm_X85Yvqzl	dcherian 2448579	2023-03-29T16:01:05Z	2023-03-29T16:01:05Z	MEMBER	We've deleted the internal `rasterio` backend in favor of rioxarray. If this issue is still relevant, please migrate the discussion to the rioxarray repo	{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	Chunked processing across multiple raster (geoTIF) files 344621749
417413527	https://github.com/pydata/xarray/issues/2314#issuecomment-417413527	https://api.github.com/repos/pydata/xarray/issues/2314	MDEyOklzc3VlQ29tbWVudDQxNzQxMzUyNw==	shoyer 1217238	2018-08-30T18:04:29Z	2018-08-30T18:04:29Z	MEMBER	I see now that you are using dask-distributed, but I guess there are still too many intermediate outputs here to do a single rechunk operation. The crude but effective way to solve this problem would be to loop over spatial tiles using an indexing operation to pull out only a limited extent, compute the calculation on each tile and then reassemble the tiles at the end. To see if this will work, you might try computing a single time-series on your merged dataset before calling `.chunk()`, e.g., `merged.isel(x=0, y=0).compute()`. In theory, I think using `chunks` in `open_rasterio` should achieve exactly what you want here (assuming the geotiffs are tiled), but as you note it makes for a giant task graph. To balance this tradeoff, I might try picking a very large initial chunksize, e.g., `xr.open_rasterio(x, chunks={'x': 3500, 'y': 3500})`. This would effectively split the "rechunk" operation into 9 entirely independent parts.	{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	Chunked processing across multiple raster (geoTIF) files 344621749
417412405	https://github.com/pydata/xarray/issues/2314#issuecomment-417412405	https://api.github.com/repos/pydata/xarray/issues/2314	MDEyOklzc3VlQ29tbWVudDQxNzQxMjQwNQ==	scottyhq 3924836	2018-08-30T18:01:02Z	2018-08-30T18:01:02Z	MEMBER	As @darothen mentioned, first thing is to check that the geotiffs themselves are tiled (otherwise I'm guessing that open_rasterio() will open the entire thing. You can do this with: `python import rasterio with rasterio.open('image_001.tif') as src: print(src.profile)` Here is the mentioned example notebook which works for tiled geotiffs stored on google cloud: https://github.com/scottyhq/pangeo-example-notebooks/tree/binderfy You can use the 'launch binder' button to run it with a pangeo dask-kubernetes cluster, or just read through the landsat8-cog-ndvi.ipynb notebook.	{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	Chunked processing across multiple raster (geoTIF) files 344621749
417404832	https://github.com/pydata/xarray/issues/2314#issuecomment-417404832	https://api.github.com/repos/pydata/xarray/issues/2314	MDEyOklzc3VlQ29tbWVudDQxNzQwNDgzMg==	shoyer 1217238	2018-08-30T17:38:40Z	2018-08-30T17:42:00Z	MEMBER	I think the explicit `chunk()` call is the source of your woes here. That creates a bunch of tasks to reshard your data that require loading the entire array into memory. If you're using dask-distributed, I think the large intermediate outputs would get cached to disk but this fails if you're using the simpler multithreaded scheduler. ~~If you drop the line that calls `.chunk()` and manually index your array to pull out a single time-series before calling `map_blocks`, does that work properly? e.g., something like `merged.isel(x=0, y=0).data.map_blocks(myfunction)`~~ (nevermind, this is probably not a great idea)	{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	Chunked processing across multiple raster (geoTIF) files 344621749
417135276	https://github.com/pydata/xarray/issues/2314#issuecomment-417135276	https://api.github.com/repos/pydata/xarray/issues/2314	MDEyOklzc3VlQ29tbWVudDQxNzEzNTI3Ng==	jhamman 2443309	2018-08-29T23:04:10Z	2018-08-29T23:04:10Z	MEMBER	pinging @scottyhq and @darothen who have both been exploring similar use cases here. I think you all met at the recent pangeo meeting.	{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	Chunked processing across multiple raster (geoTIF) files 344621749

Advanced export

JSON shape: default, array, newline-delimited, object

CREATE TABLE [issue_comments] (
   [html_url] TEXT,
   [issue_url] TEXT,
   [id] INTEGER PRIMARY KEY,
   [node_id] TEXT,
   [user] INTEGER REFERENCES [users]([id]),
   [created_at] TEXT,
   [updated_at] TEXT,
   [author_association] TEXT,
   [body] TEXT,
   [reactions] TEXT,
   [performed_via_github_app] TEXT,
   [issue] INTEGER REFERENCES [issues]([id])
);
CREATE INDEX [idx_issue_comments_issue]
    ON [issue_comments] ([issue]);
CREATE INDEX [idx_issue_comments_user]
    ON [issue_comments] ([user]);