html_url,issue_url,id,node_id,user,created_at,updated_at,author_association,body,reactions,performed_via_github_app,issue
https://github.com/pydata/xarray/issues/2314#issuecomment-1488891109,https://api.github.com/repos/pydata/xarray/issues/2314,1488891109,IC_kwDOAMm_X85Yvqzl,2448579,2023-03-29T16:01:05Z,2023-03-29T16:01:05Z,MEMBER,"We've deleted the internal `rasterio` backend in favor of [rioxarray](https://corteva.github.io/rioxarray/stable/getting_started/getting_started.html#rioxarray). If this issue is still relevant, please migrate the discussion to the [rioxarray repo](https://github.com/corteva/rioxarray/)","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,344621749
https://github.com/pydata/xarray/issues/2314#issuecomment-417413527,https://api.github.com/repos/pydata/xarray/issues/2314,417413527,MDEyOklzc3VlQ29tbWVudDQxNzQxMzUyNw==,1217238,2018-08-30T18:04:29Z,2018-08-30T18:04:29Z,MEMBER,"I see now that you *are* using dask-distributed, but I guess there are still too many intermediate outputs here to do a single rechunk operation.
The crude but effective way to solve this problem would be to loop over spatial tiles using an indexing operation to pull out only a limited extent, compute the calculation on each tile and then reassemble the tiles at the end. To see if this will work, you might try computing a single time-series on your merged dataset before calling `.chunk()`, e.g., `merged.isel(x=0, y=0).compute()`.
In theory, I think using `chunks` in `open_rasterio` should achieve exactly what you want here (assuming the geotiffs are tiled), but as you note it makes for a giant task graph. To balance this tradeoff, I might try picking a very large initial chunksize, e.g., `xr.open_rasterio(x, chunks={'x': 3500, 'y': 3500})`. This would effectively split the ""rechunk"" operation into 9 entirely independent parts.","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,344621749
https://github.com/pydata/xarray/issues/2314#issuecomment-417412405,https://api.github.com/repos/pydata/xarray/issues/2314,417412405,MDEyOklzc3VlQ29tbWVudDQxNzQxMjQwNQ==,3924836,2018-08-30T18:01:02Z,2018-08-30T18:01:02Z,MEMBER,"As @darothen mentioned, first thing is to check that the geotiffs themselves are tiled (otherwise I'm guessing that open_rasterio() will open the entire thing. You can do this with:
```python
import rasterio
with rasterio.open('image_001.tif') as src:
print(src.profile)
```
Here is the mentioned example notebook which works for tiled geotiffs stored on google cloud:
https://github.com/scottyhq/pangeo-example-notebooks/tree/binderfy
You can use the 'launch binder' button to run it with a pangeo dask-kubernetes cluster, or just read through the landsat8-cog-ndvi.ipynb notebook.","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,344621749
https://github.com/pydata/xarray/issues/2314#issuecomment-417404832,https://api.github.com/repos/pydata/xarray/issues/2314,417404832,MDEyOklzc3VlQ29tbWVudDQxNzQwNDgzMg==,1217238,2018-08-30T17:38:40Z,2018-08-30T17:42:00Z,MEMBER,"I think the explicit `chunk()` call is the source of your woes here. That creates a bunch of tasks to reshard your data that require loading the entire array into memory. If you're using dask-distributed, I think the large intermediate outputs would get cached to disk but this fails if you're using the simpler multithreaded scheduler.
~~If you drop the line that calls `.chunk()` and manually index your array to pull out a single time-series before calling `map_blocks`, does that work properly? e.g., something like `merged.isel(x=0, y=0).data.map_blocks(myfunction)`~~ (nevermind, this is probably not a great idea)","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,344621749
https://github.com/pydata/xarray/issues/2314#issuecomment-417135276,https://api.github.com/repos/pydata/xarray/issues/2314,417135276,MDEyOklzc3VlQ29tbWVudDQxNzEzNTI3Ng==,2443309,2018-08-29T23:04:10Z,2018-08-29T23:04:10Z,MEMBER,pinging @scottyhq and @darothen who have both been exploring similar use cases here. I think you all met at the recent pangeo meeting. ,"{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,344621749