issue_comments
11 rows where issue = 344621749 sorted by updated_at descending
This data as json, CSV (advanced)
Suggested facets: created_at (date), updated_at (date)
issue 1
- Chunked processing across multiple raster (geoTIF) files · 11 ✖
id | html_url | issue_url | node_id | user | created_at | updated_at ▲ | author_association | body | reactions | performed_via_github_app | issue |
---|---|---|---|---|---|---|---|---|---|---|---|
1488891109 | https://github.com/pydata/xarray/issues/2314#issuecomment-1488891109 | https://api.github.com/repos/pydata/xarray/issues/2314 | IC_kwDOAMm_X85Yvqzl | dcherian 2448579 | 2023-03-29T16:01:05Z | 2023-03-29T16:01:05Z | MEMBER | We've deleted the internal |
{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
Chunked processing across multiple raster (geoTIF) files 344621749 | |
1085125053 | https://github.com/pydata/xarray/issues/2314#issuecomment-1085125053 | https://api.github.com/repos/pydata/xarray/issues/2314 | IC_kwDOAMm_X85ArbG9 | gjoseph92 3309802 | 2022-03-31T21:15:59Z | 2022-03-31T21:15:59Z | NONE | Just noticed this issue; people needing to do this sort of thing might want to look at stackstac (especially playing with the
FYI, this is basically expected behavior for distributed, see: * https://github.com/dask/distributed/issues/5223 * https://github.com/dask/distributed/issues/5555 |
{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
Chunked processing across multiple raster (geoTIF) files 344621749 | |
666422864 | https://github.com/pydata/xarray/issues/2314#issuecomment-666422864 | https://api.github.com/repos/pydata/xarray/issues/2314 | MDEyOklzc3VlQ29tbWVudDY2NjQyMjg2NA== | darothen 4992424 | 2020-07-30T14:52:50Z | 2020-07-30T14:52:50Z | NONE | Hi @shaprann, I haven't re-visited this exact workflow recently, but one really good option (if you can manage the intermediate storage cost) would be to try to use new tools like http://github.com/pangeo-data/rechunker to pre-process and prepare your data archive prior to analysis. |
{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
Chunked processing across multiple raster (geoTIF) files 344621749 | |
665976915 | https://github.com/pydata/xarray/issues/2314#issuecomment-665976915 | https://api.github.com/repos/pydata/xarray/issues/2314 | MDEyOklzc3VlQ29tbWVudDY2NTk3NjkxNQ== | shaprann 43274047 | 2020-07-29T23:12:37Z | 2020-07-29T23:12:37Z | NONE | This particular use case is extremely common when working with spatio-temporal data. Can anyone suggest a good workaround for this? |
{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
Chunked processing across multiple raster (geoTIF) files 344621749 | |
589284788 | https://github.com/pydata/xarray/issues/2314#issuecomment-589284788 | https://api.github.com/repos/pydata/xarray/issues/2314 | MDEyOklzc3VlQ29tbWVudDU4OTI4NDc4OA== | pblankenau2 13680523 | 2020-02-20T20:09:31Z | 2020-02-20T20:09:31Z | NONE | Has there been any progress on this issue? I am bumping into the same problem. |
{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
Chunked processing across multiple raster (geoTIF) files 344621749 | |
417419321 | https://github.com/pydata/xarray/issues/2314#issuecomment-417419321 | https://api.github.com/repos/pydata/xarray/issues/2314 | MDEyOklzc3VlQ29tbWVudDQxNzQxOTMyMQ== | lmadaus 2489879 | 2018-08-30T18:22:31Z | 2018-08-30T18:22:31Z | NONE | Thanks for all the suggestions! An update from when I originally posted this. Aligning with @shoyer, @darothen and @scottyhq 's comments, we've tested the code using cloud-optimized geoTIF files and regular geoTIFs, and it does perform better with the cloud-optimized form, though still appears to "over-eagerly" load more than just what is being worked on. With the cloud-optimized form, performance is much better when we specify the chunking strategy on the initial open_rasterio and it aligns with the chunk sizes.
e.g.
The result is a larger task graph (and much more time spent developing the task graph) but more cases where we don't run into memory problems. There still appears to be a lot more memory used than I expect, but am actively working on exploring options. We've also noticed better performance using a k8s Dask cluster distributed across multiple "independent" workers as opposed to using a LocalCluster on a single large machine. As in, with the distributed cluster the "myfunction" (fit) operation starts happening on chunks well before the entire dataset is loaded, whereas in the LocalCluster it still tends not to begin until all chunks have been loaded in. Not exactly sure what would cause that... I'm intrigued by @shoyer 's last suggestion of an "intermediate" chunking step. Will test that and potentially the manual iteration over the tiles. Thanks for all the suggestions and thoughts! |
{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
Chunked processing across multiple raster (geoTIF) files 344621749 | |
417413527 | https://github.com/pydata/xarray/issues/2314#issuecomment-417413527 | https://api.github.com/repos/pydata/xarray/issues/2314 | MDEyOklzc3VlQ29tbWVudDQxNzQxMzUyNw== | shoyer 1217238 | 2018-08-30T18:04:29Z | 2018-08-30T18:04:29Z | MEMBER | I see now that you are using dask-distributed, but I guess there are still too many intermediate outputs here to do a single rechunk operation. The crude but effective way to solve this problem would be to loop over spatial tiles using an indexing operation to pull out only a limited extent, compute the calculation on each tile and then reassemble the tiles at the end. To see if this will work, you might try computing a single time-series on your merged dataset before calling In theory, I think using |
{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
Chunked processing across multiple raster (geoTIF) files 344621749 | |
417412405 | https://github.com/pydata/xarray/issues/2314#issuecomment-417412405 | https://api.github.com/repos/pydata/xarray/issues/2314 | MDEyOklzc3VlQ29tbWVudDQxNzQxMjQwNQ== | scottyhq 3924836 | 2018-08-30T18:01:02Z | 2018-08-30T18:01:02Z | MEMBER | As @darothen mentioned, first thing is to check that the geotiffs themselves are tiled (otherwise I'm guessing that open_rasterio() will open the entire thing. You can do this with:
Here is the mentioned example notebook which works for tiled geotiffs stored on google cloud: https://github.com/scottyhq/pangeo-example-notebooks/tree/binderfy You can use the 'launch binder' button to run it with a pangeo dask-kubernetes cluster, or just read through the landsat8-cog-ndvi.ipynb notebook. |
{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
Chunked processing across multiple raster (geoTIF) files 344621749 | |
417404832 | https://github.com/pydata/xarray/issues/2314#issuecomment-417404832 | https://api.github.com/repos/pydata/xarray/issues/2314 | MDEyOklzc3VlQ29tbWVudDQxNzQwNDgzMg== | shoyer 1217238 | 2018-08-30T17:38:40Z | 2018-08-30T17:42:00Z | MEMBER | I think the explicit ~~If you drop the line that calls |
{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
Chunked processing across multiple raster (geoTIF) files 344621749 | |
417175383 | https://github.com/pydata/xarray/issues/2314#issuecomment-417175383 | https://api.github.com/repos/pydata/xarray/issues/2314 | MDEyOklzc3VlQ29tbWVudDQxNzE3NTM4Mw== | darothen 4992424 | 2018-08-30T03:09:41Z | 2018-08-30T03:09:41Z | NONE | Can you provide a |
{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
Chunked processing across multiple raster (geoTIF) files 344621749 | |
417135276 | https://github.com/pydata/xarray/issues/2314#issuecomment-417135276 | https://api.github.com/repos/pydata/xarray/issues/2314 | MDEyOklzc3VlQ29tbWVudDQxNzEzNTI3Ng== | jhamman 2443309 | 2018-08-29T23:04:10Z | 2018-08-29T23:04:10Z | MEMBER | pinging @scottyhq and @darothen who have both been exploring similar use cases here. I think you all met at the recent pangeo meeting. |
{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
Chunked processing across multiple raster (geoTIF) files 344621749 |
Advanced export
JSON shape: default, array, newline-delimited, object
CREATE TABLE [issue_comments] ( [html_url] TEXT, [issue_url] TEXT, [id] INTEGER PRIMARY KEY, [node_id] TEXT, [user] INTEGER REFERENCES [users]([id]), [created_at] TEXT, [updated_at] TEXT, [author_association] TEXT, [body] TEXT, [reactions] TEXT, [performed_via_github_app] TEXT, [issue] INTEGER REFERENCES [issues]([id]) ); CREATE INDEX [idx_issue_comments_issue] ON [issue_comments] ([issue]); CREATE INDEX [idx_issue_comments_user] ON [issue_comments] ([user]);
user 9