html_url,issue_url,id,node_id,user,created_at,updated_at,author_association,body,reactions,performed_via_github_app,issue
https://github.com/pydata/xarray/issues/4112#issuecomment-704530619,https://api.github.com/repos/pydata/xarray/issues/4112,704530619,MDEyOklzc3VlQ29tbWVudDcwNDUzMDYxOQ==,14314623,2020-10-06T20:20:34Z,2020-10-06T20:20:34Z,CONTRIBUTOR,"Just tried this with the newest dask version and can confirm that I do not get huge chunks anymore *IF* i specify `dask.config.set({""array.slicing.split_large_chunks"": True})`. I also needed to modify the example to exceed the internal chunk size limitation:
```python
import numpy as np
import xarray as xr
import dask
dask.config.set({""array.slicing.split_large_chunks"": True})
short_time = xr.cftime_range('2000', periods=12)
long_time = xr.cftime_range('2000', periods=120)
data_short = np.random.rand(len(short_time))
data_long = np.random.rand(len(long_time))
n=1000
a = xr.DataArray(data_short, dims=['time'], coords={'time':short_time}).expand_dims(a=n, b=n).chunk({'time':3})
b = xr.DataArray(data_long, dims=['time'], coords={'time':long_time}).expand_dims(a=n, b=n).chunk({'time':3})
a,b = xr.align(a,b, join = 'outer')
```
with the option turned on I get this for `a`;

with the defaults, I still get one giant chunk.

Ill try this soon in a real world scenario described above. Just wanted to report back here.
","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,627600168
https://github.com/pydata/xarray/issues/4112#issuecomment-643513541,https://api.github.com/repos/pydata/xarray/issues/4112,643513541,MDEyOklzc3VlQ29tbWVudDY0MzUxMzU0MQ==,2448579,2020-06-12T22:55:12Z,2020-06-12T22:55:12Z,MEMBER,"> One option might be to rewrite Dask's indexing functionality to ""split"" chunks that are much larger than their inputs into smaller pieces, even if they all come from the same input chunk?
This is Tom's proposed solution in https://github.com/dask/dask/issues/6270","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,627600168
https://github.com/pydata/xarray/issues/4112#issuecomment-643512625,https://api.github.com/repos/pydata/xarray/issues/4112,643512625,MDEyOklzc3VlQ29tbWVudDY0MzUxMjYyNQ==,1217238,2020-06-12T22:50:57Z,2020-06-12T22:50:57Z,MEMBER,"The problem with chunking indexers is that then dask doesn't have any visibility into the indexing values, which means the graph now grows like the square of the number of chunks along an axis, instead of proportional to the number of chunks.
The real operation that xarray needs here is `Variable._getitem_with_mask`, i.e., indexing with `-1` remapped to a fill value:
https://github.com/pydata/xarray/blob/e8bd8665e8fd762031c2d9c87987d21e113e41cc/xarray/core/variable.py#L715
The padded portion of the array is used in indexing, but only so the result is aligned for `np.where` to replace with the fill value. We actually don't look at those values at all.
I don't know the best way to handle this. One option might be to rewrite Dask's indexing functionality to ""split"" chunks that are much larger than their inputs into smaller pieces, even if they all come from the same input chunk?","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,627600168
https://github.com/pydata/xarray/issues/4112#issuecomment-643346497,https://api.github.com/repos/pydata/xarray/issues/4112,643346497,MDEyOklzc3VlQ29tbWVudDY0MzM0NjQ5Nw==,2448579,2020-06-12T15:51:31Z,2020-06-12T15:52:58Z,MEMBER,"Thanks @TomAugspurger
I think an upstream dask solution would be useful.
xarray automatic aligns objects everywhere and this alignment is what is blowing things up. For this reason I think xarray should explicitly chunk the indexer when aligning. We could use a reasonable chunk size like median chunk size of dataarray along that axis — this would respect the user's chunksize choices.
@shoyer What do you think?","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,627600168
https://github.com/pydata/xarray/issues/4112#issuecomment-636808986,https://api.github.com/repos/pydata/xarray/issues/4112,636808986,MDEyOklzc3VlQ29tbWVudDYzNjgwODk4Ng==,1312546,2020-06-01T11:44:23Z,2020-06-01T11:44:23Z,MEMBER,Rechunking the `indexer` array is how I would be explicit about the desired chunk size. Opened https://github.com/dask/dask/issues/6270 to discuss this on the dask side.,"{""total_count"": 2, ""+1"": 2, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,627600168
https://github.com/pydata/xarray/issues/4112#issuecomment-636334010,https://api.github.com/repos/pydata/xarray/issues/4112,636334010,MDEyOklzc3VlQ29tbWVudDYzNjMzNDAxMA==,2448579,2020-05-30T13:52:33Z,2020-05-30T13:53:31Z,MEMBER,"Great diagnosis @jbusecke .
Ultimately this comes down to dask indexing
``` python
import dask.array
arr = dask.array.from_array([0, 1, 2, 3], chunks=(1,))
print(arr.chunks) # ((1, 1, 1, 1),)
# align calls reindex which indexes with something like this
indexer = [0, 1, 2, 3, ] + [-1,] * 111
print(arr[indexer].chunks) # ((1, 1, 1, 112),)
# maybe something like this is a solution
lazy_indexer = dask.array.from_array(indexer, chunks=arr.chunks[0][0], name=""idx"")
print(arr[lazy_indexer].chunks) # ((1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1),)
```
cc @TomAugspurger, the issue here is that big `112` size chunk takes down the cluster in https://github.com/NCAR/intake-esm/issues/225
","{""total_count"": 1, ""+1"": 1, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,627600168