home / github / issue_comments

Menu
  • Search all tables
  • GraphQL API

issue_comments: 643512625

This data as json

html_url issue_url id node_id user created_at updated_at author_association body reactions performed_via_github_app issue
https://github.com/pydata/xarray/issues/4112#issuecomment-643512625 https://api.github.com/repos/pydata/xarray/issues/4112 643512625 MDEyOklzc3VlQ29tbWVudDY0MzUxMjYyNQ== 1217238 2020-06-12T22:50:57Z 2020-06-12T22:50:57Z MEMBER

The problem with chunking indexers is that then dask doesn't have any visibility into the indexing values, which means the graph now grows like the square of the number of chunks along an axis, instead of proportional to the number of chunks.

The real operation that xarray needs here is Variable._getitem_with_mask, i.e., indexing with -1 remapped to a fill value: https://github.com/pydata/xarray/blob/e8bd8665e8fd762031c2d9c87987d21e113e41cc/xarray/core/variable.py#L715

The padded portion of the array is used in indexing, but only so the result is aligned for np.where to replace with the fill value. We actually don't look at those values at all.

I don't know the best way to handle this. One option might be to rewrite Dask's indexing functionality to "split" chunks that are much larger than their inputs into smaller pieces, even if they all come from the same input chunk?

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  627600168
Powered by Datasette · Queries took 0.705ms · About: xarray-datasette