github: issue_comments: 6 rows where author_association = "NONE" and issue = 344621749 sorted by updated

6 rows where author_association = "NONE" and issue = 344621749 sorted by updated_at descending

Search:

descending

id	html_url	issue_url	node_id	user	created_at	updated_at ▲	author_association	body	reactions	issue
1085125053	https://github.com/pydata/xarray/issues/2314#issuecomment-1085125053	https://api.github.com/repos/pydata/xarray/issues/2314	IC_kwDOAMm_X85ArbG9	gjoseph92 3309802	2022-03-31T21:15:59Z	2022-03-31T21:15:59Z	NONE	Just noticed this issue; people needing to do this sort of thing might want to look at stackstac (especially playing with the `chunks=` parameter) or odc-stac for loading the data. The graph will be cleaner than what you'd get from `xr.concat([xr.open_rasterio(...) for ...])`. still appears to "over-eagerly" load more than just what is being worked on FYI, this is basically expected behavior for distributed, see: * https://github.com/dask/distributed/issues/5223 * https://github.com/dask/distributed/issues/5555	{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	Chunked processing across multiple raster (geoTIF) files 344621749
666422864	https://github.com/pydata/xarray/issues/2314#issuecomment-666422864	https://api.github.com/repos/pydata/xarray/issues/2314	MDEyOklzc3VlQ29tbWVudDY2NjQyMjg2NA==	darothen 4992424	2020-07-30T14:52:50Z	2020-07-30T14:52:50Z	NONE	Hi @shaprann, I haven't re-visited this exact workflow recently, but one really good option (if you can manage the intermediate storage cost) would be to try to use new tools like http://github.com/pangeo-data/rechunker to pre-process and prepare your data archive prior to analysis.	{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	Chunked processing across multiple raster (geoTIF) files 344621749
665976915	https://github.com/pydata/xarray/issues/2314#issuecomment-665976915	https://api.github.com/repos/pydata/xarray/issues/2314	MDEyOklzc3VlQ29tbWVudDY2NTk3NjkxNQ==	shaprann 43274047	2020-07-29T23:12:37Z	2020-07-29T23:12:37Z	NONE	This particular use case is extremely common when working with spatio-temporal data. Can anyone suggest a good workaround for this?	{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	Chunked processing across multiple raster (geoTIF) files 344621749
589284788	https://github.com/pydata/xarray/issues/2314#issuecomment-589284788	https://api.github.com/repos/pydata/xarray/issues/2314	MDEyOklzc3VlQ29tbWVudDU4OTI4NDc4OA==	pblankenau2 13680523	2020-02-20T20:09:31Z	2020-02-20T20:09:31Z	NONE	Has there been any progress on this issue? I am bumping into the same problem.	{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	Chunked processing across multiple raster (geoTIF) files 344621749
417419321	https://github.com/pydata/xarray/issues/2314#issuecomment-417419321	https://api.github.com/repos/pydata/xarray/issues/2314	MDEyOklzc3VlQ29tbWVudDQxNzQxOTMyMQ==	lmadaus 2489879	2018-08-30T18:22:31Z	2018-08-30T18:22:31Z	NONE	Thanks for all the suggestions! An update from when I originally posted this. Aligning with @shoyer, @darothen and @scottyhq 's comments, we've tested the code using cloud-optimized geoTIF files and regular geoTIFs, and it does perform better with the cloud-optimized form, though still appears to "over-eagerly" load more than just what is being worked on. With the cloud-optimized form, performance is much better when we specify the chunking strategy on the initial open_rasterio and it aligns with the chunk sizes. e.g. `rasterlist = [xr.open_rasterio(x, chunks={'x': 256, 'y': 256}) for x in filelist]` vs. `rasterlist = [xr.open_rasterio(x, chunks={'x': None, 'y': None}) for x in filelist]` The result is a larger task graph (and much more time spent developing the task graph) but more cases where we don't run into memory problems. There still appears to be a lot more memory used than I expect, but am actively working on exploring options. We've also noticed better performance using a k8s Dask cluster distributed across multiple "independent" workers as opposed to using a LocalCluster on a single large machine. As in, with the distributed cluster the "myfunction" (fit) operation starts happening on chunks well before the entire dataset is loaded, whereas in the LocalCluster it still tends not to begin until all chunks have been loaded in. Not exactly sure what would cause that... I'm intrigued by @shoyer 's last suggestion of an "intermediate" chunking step. Will test that and potentially the manual iteration over the tiles. Thanks for all the suggestions and thoughts!	{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	Chunked processing across multiple raster (geoTIF) files 344621749
417175383	https://github.com/pydata/xarray/issues/2314#issuecomment-417175383	https://api.github.com/repos/pydata/xarray/issues/2314	MDEyOklzc3VlQ29tbWVudDQxNzE3NTM4Mw==	darothen 4992424	2018-08-30T03:09:41Z	2018-08-30T03:09:41Z	NONE	Can you provide a `gdalinfo` of one of the GeoTiffs? I'm still working on some documentation for use-cases with cloud-optimized GeoTiffs to supplement @scottyhq's fantastic example notebook. One of the wrinkles I'm tracking down and trying to document is when exactly the GDAL->rasterio->dask->xarray pipeline eagerly load the entire file versus when it defers reading or reads subsets of files. So far, it seems that if the GeoTiff is appropriately chunked ahead of time (when it's written to disk), things basically work "automagically."	{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	Chunked processing across multiple raster (geoTIF) files 344621749

Advanced export

JSON shape: default, array, newline-delimited, object

CREATE TABLE [issue_comments] (
   [html_url] TEXT,
   [issue_url] TEXT,
   [id] INTEGER PRIMARY KEY,
   [node_id] TEXT,
   [user] INTEGER REFERENCES [users]([id]),
   [created_at] TEXT,
   [updated_at] TEXT,
   [author_association] TEXT,
   [body] TEXT,
   [reactions] TEXT,
   [performed_via_github_app] TEXT,
   [issue] INTEGER REFERENCES [issues]([id])
);
CREATE INDEX [idx_issue_comments_issue]
    ON [issue_comments] ([issue]);
CREATE INDEX [idx_issue_comments_user]
    ON [issue_comments] ([user]);