html_url,issue_url,id,node_id,user,created_at,updated_at,author_association,body,reactions,performed_via_github_app,issue
https://github.com/pydata/xarray/issues/2314#issuecomment-417413527,https://api.github.com/repos/pydata/xarray/issues/2314,417413527,MDEyOklzc3VlQ29tbWVudDQxNzQxMzUyNw==,1217238,2018-08-30T18:04:29Z,2018-08-30T18:04:29Z,MEMBER,"I see now that you *are* using dask-distributed, but I guess there are still too many intermediate outputs here to do a single rechunk operation.
The crude but effective way to solve this problem would be to loop over spatial tiles using an indexing operation to pull out only a limited extent, compute the calculation on each tile and then reassemble the tiles at the end. To see if this will work, you might try computing a single time-series on your merged dataset before calling `.chunk()`, e.g., `merged.isel(x=0, y=0).compute()`.
In theory, I think using `chunks` in `open_rasterio` should achieve exactly what you want here (assuming the geotiffs are tiled), but as you note it makes for a giant task graph. To balance this tradeoff, I might try picking a very large initial chunksize, e.g., `xr.open_rasterio(x, chunks={'x': 3500, 'y': 3500})`. This would effectively split the ""rechunk"" operation into 9 entirely independent parts.","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,344621749
https://github.com/pydata/xarray/issues/2314#issuecomment-417404832,https://api.github.com/repos/pydata/xarray/issues/2314,417404832,MDEyOklzc3VlQ29tbWVudDQxNzQwNDgzMg==,1217238,2018-08-30T17:38:40Z,2018-08-30T17:42:00Z,MEMBER,"I think the explicit `chunk()` call is the source of your woes here. That creates a bunch of tasks to reshard your data that require loading the entire array into memory. If you're using dask-distributed, I think the large intermediate outputs would get cached to disk but this fails if you're using the simpler multithreaded scheduler.
~~If you drop the line that calls `.chunk()` and manually index your array to pull out a single time-series before calling `map_blocks`, does that work properly? e.g., something like `merged.isel(x=0, y=0).data.map_blocks(myfunction)`~~ (nevermind, this is probably not a great idea)","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,344621749