home / github / issue_comments

Menu
  • GraphQL API
  • Search all tables

issue_comments: 417413527

This data as json

html_url issue_url id node_id user created_at updated_at author_association body reactions performed_via_github_app issue
https://github.com/pydata/xarray/issues/2314#issuecomment-417413527 https://api.github.com/repos/pydata/xarray/issues/2314 417413527 MDEyOklzc3VlQ29tbWVudDQxNzQxMzUyNw== 1217238 2018-08-30T18:04:29Z 2018-08-30T18:04:29Z MEMBER

I see now that you are using dask-distributed, but I guess there are still too many intermediate outputs here to do a single rechunk operation.

The crude but effective way to solve this problem would be to loop over spatial tiles using an indexing operation to pull out only a limited extent, compute the calculation on each tile and then reassemble the tiles at the end. To see if this will work, you might try computing a single time-series on your merged dataset before calling .chunk(), e.g., merged.isel(x=0, y=0).compute().

In theory, I think using chunks in open_rasterio should achieve exactly what you want here (assuming the geotiffs are tiled), but as you note it makes for a giant task graph. To balance this tradeoff, I might try picking a very large initial chunksize, e.g., xr.open_rasterio(x, chunks={'x': 3500, 'y': 3500}). This would effectively split the "rechunk" operation into 9 entirely independent parts.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  344621749
Powered by Datasette · Queries took 0.64ms · About: xarray-datasette