html_url,issue_url,id,node_id,user,created_at,updated_at,author_association,body,reactions,performed_via_github_app,issue
https://github.com/pydata/xarray/issues/4112#issuecomment-643513541,https://api.github.com/repos/pydata/xarray/issues/4112,643513541,MDEyOklzc3VlQ29tbWVudDY0MzUxMzU0MQ==,2448579,2020-06-12T22:55:12Z,2020-06-12T22:55:12Z,MEMBER,"> One option might be to rewrite Dask's indexing functionality to ""split"" chunks that are much larger than their inputs into smaller pieces, even if they all come from the same input chunk?
This is Tom's proposed solution in https://github.com/dask/dask/issues/6270","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,627600168
https://github.com/pydata/xarray/issues/4112#issuecomment-643346497,https://api.github.com/repos/pydata/xarray/issues/4112,643346497,MDEyOklzc3VlQ29tbWVudDY0MzM0NjQ5Nw==,2448579,2020-06-12T15:51:31Z,2020-06-12T15:52:58Z,MEMBER,"Thanks @TomAugspurger
I think an upstream dask solution would be useful.
xarray automatic aligns objects everywhere and this alignment is what is blowing things up. For this reason I think xarray should explicitly chunk the indexer when aligning. We could use a reasonable chunk size like median chunk size of dataarray along that axis — this would respect the user's chunksize choices.
@shoyer What do you think?","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,627600168
https://github.com/pydata/xarray/issues/4112#issuecomment-636334010,https://api.github.com/repos/pydata/xarray/issues/4112,636334010,MDEyOklzc3VlQ29tbWVudDYzNjMzNDAxMA==,2448579,2020-05-30T13:52:33Z,2020-05-30T13:53:31Z,MEMBER,"Great diagnosis @jbusecke .
Ultimately this comes down to dask indexing
``` python
import dask.array
arr = dask.array.from_array([0, 1, 2, 3], chunks=(1,))
print(arr.chunks) # ((1, 1, 1, 1),)
# align calls reindex which indexes with something like this
indexer = [0, 1, 2, 3, ] + [-1,] * 111
print(arr[indexer].chunks) # ((1, 1, 1, 112),)
# maybe something like this is a solution
lazy_indexer = dask.array.from_array(indexer, chunks=arr.chunks[0][0], name=""idx"")
print(arr[lazy_indexer].chunks) # ((1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1),)
```
cc @TomAugspurger, the issue here is that big `112` size chunk takes down the cluster in https://github.com/NCAR/intake-esm/issues/225
","{""total_count"": 1, ""+1"": 1, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,627600168