html_url,issue_url,id,node_id,user,created_at,updated_at,author_association,body,reactions,performed_via_github_app,issue
https://github.com/pydata/xarray/issues/1995#issuecomment-842556731,https://api.github.com/repos/pydata/xarray/issues/1995,842556731,MDEyOklzc3VlQ29tbWVudDg0MjU1NjczMQ==,35968931,2021-05-17T18:59:18Z,2021-05-17T18:59:18Z,MEMBER,"Has this not been solved by the argument `allow_rechunk`?

@crusaderky isn't this effectively what you were trying to achieve?

```python
import xarray as xr

def mulsum(a, b):
    acc = 0
    for i in range(a.size):
        acc += a[i] * b[i]
    return acc

a = xr.DataArray(data=[1, 2, 3], dims=['x']).chunk({""x"": 1})

b = xr.DataArray(data=[4, 5, 6], dims=['x']).chunk({""x"": 1})

c = xr.apply_ufunc(
    mulsum, a, b,
    input_core_dims=[['x'], ['x']],
    dask='parallelized', output_dtypes=[float],
    dask_gufunc_kwargs={'allow_rechunk': True})

print(c.compute())
```
returns
```
<xarray.DataArray ()>
array(32)
```

I think this has only been possible since the implementation of `xarray.apply_ufunc` was switched from `dask.array.blockwise` to `dask.array.apply_gufunc` in #4060.

If this is actually doing what I think it's doing then we should document this possibility!","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,305757822
https://github.com/pydata/xarray/issues/1995#issuecomment-603493332,https://api.github.com/repos/pydata/xarray/issues/1995,603493332,MDEyOklzc3VlQ29tbWVudDYwMzQ5MzMzMg==,26384082,2020-03-24T20:40:45Z,2020-03-24T20:40:45Z,NONE,"In order to maintain a list of currently relevant issues, we mark issues as stale after a period of inactivity

If this issue remains relevant, please comment here or remove the `stale` label; otherwise it will be marked as closed automatically
","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,305757822
https://github.com/pydata/xarray/issues/1995#issuecomment-384071053,https://api.github.com/repos/pydata/xarray/issues/1995,384071053,MDEyOklzc3VlQ29tbWVudDM4NDA3MTA1Mw==,6213168,2018-04-24T20:35:21Z,2018-04-24T20:36:00Z,MEMBER,"@shoyer , you don't really need a parameter ``possibly_chunked_core_dims=['x']``; you are already specifying ``output_chunks`` - without which apply_ufunc won't know what to do and crash...","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,305757822
https://github.com/pydata/xarray/issues/1995#issuecomment-373870013,https://api.github.com/repos/pydata/xarray/issues/1995,373870013,MDEyOklzc3VlQ29tbWVudDM3Mzg3MDAxMw==,6213168,2018-03-16T23:19:19Z,2018-03-29T09:57:14Z,MEMBER,"[EDIT] drastically simplified chunking algorithm

@shoyer , close, but your version doesn't work in case of broadcasting.
I think I fixed it although it won't work correctly if only one between a or b has dask backend, and I'm not sure how to fix it:

```python
import xarray
import numpy
import dask.array

coefficients = xarray.DataArray(
    dask.array.random.random((106, 99), chunks=(25, 25)),
    dims=['formula', 'time'])
components = xarray.DataArray(
    dask.array.random.random((106, 512 * 1024), chunks=(25, 65536)),
    dims=['formula', 'scenario'])

def mulsum(a, b, dim):
    return xarray.apply_ufunc(
        _mulsum_xarray_kernel, a, b,
        input_core_dims=[[dim], [dim]],
        dask='allowed', output_dtypes=[float])


def _mulsum_xarray_kernel(a, b):
    if isinstance(a, dask.array.Array) and isinstance(b, dask.array.Array):
        chunks = dask.array.core.broadcast_chunks(a.chunks, b.chunks)
        chunks = chunks[:-1] + (tuple(1 for _ in chunks[-1]), )

        mapped = dask.array.map_blocks(
            _mulsum_dask_kernel, a, b,
            dtype=float, chunks=chunks)
        return dask.array.sum(mapped, axis=-1)
    else:
        return _mulsum_dask_kernel(a, b)


def _mulsum_dask_kernel(a, b):
    a = numpy.ascontiguousarray(a)
    b = numpy.ascontiguousarray(b)
    res = numpy.einsum('...i,...i', a, b, optimize='optimal')
    return res[..., numpy.newaxis]

mulsum(coefficients, components, dim='formula')
```

# Proposal 2
Modify apply_ufunc:
 * remove the check that the input_core_dims must not be chunked
 * add parameter output_chunks

My initial example would become:

```python
def mulsum_kernel(a, b):
    return numpy.einsum('...i,...i', a, b)[..., numpy.newaxis]

c = xarray.apply_ufunc(
    mulsum_kernel, a, b,
    dask='parallelized', 
    input_core_dims=[['x'], ['x']],
    output_dtypes=[float],
    output_core_dims=[['__partial']],
    output_chunks={'__partial': [1 for _ in a.chunks[a.dims.index('x')]}
).sum('__partial')
```
Although I'm not sure this approach would be univocous when there's more than one core_dim...","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,305757822
https://github.com/pydata/xarray/issues/1995#issuecomment-373871784,https://api.github.com/repos/pydata/xarray/issues/1995,373871784,MDEyOklzc3VlQ29tbWVudDM3Mzg3MTc4NA==,1217238,2018-03-16T23:32:07Z,2018-03-16T23:32:07Z,MEMBER,">Modify apply_ufunc:
remove the check that the input_core_dims must not be chunked
add parameter output_chunks

My main concern is ensuring that someone does not inadvertently apply a function not designed for multiple chunks to dask arrays. For example, suppose the function being applied is `np.median`.

Some loud flag that makes it very obvious what's going on seems like a good idea, e.g., `possibly_chunked_core_dims=['x']`?

Then we also need some sort of guarantee that chunked core dimensions aren't entirely removed, or else xarray/dask won't know how to stack them back up. I guess we could check to make sure that at least as many output core dimensions appear as appear in inputs cor edimensions?","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,305757822
https://github.com/pydata/xarray/issues/1995#issuecomment-373579142,https://api.github.com/repos/pydata/xarray/issues/1995,373579142,MDEyOklzc3VlQ29tbWVudDM3MzU3OTE0Mg==,1217238,2018-03-16T01:55:44Z,2018-03-16T01:55:44Z,MEMBER,"Try:
```python
import dask.array
import numpy as np

def mulsum_chunk(a, b):
  return np.einsum('...i,...i', a, b)[..., np.newaxis]

def mulsum(a, b):
  # needs broadcasting/rechunking for a,b
  mapped = dask.array.map_blocks(mulsum_chunk, a, b, dtype=float,
                                 chunks=a.chunks[:-1] + (tuple(1 for _ in a.chunks[-1]),))
  return dask.array.sum(mapped, axis=-1)
```","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,305757822
https://github.com/pydata/xarray/issues/1995#issuecomment-373578226,https://api.github.com/repos/pydata/xarray/issues/1995,373578226,MDEyOklzc3VlQ29tbWVudDM3MzU3ODIyNg==,1217238,2018-03-16T01:50:07Z,2018-03-16T01:50:07Z,MEMBER,"> could you make an example? That was my first thought but I could not figure out how to make the apply_ufunc do it.

OK, thinking a little more about it, this would not work with `dask='parallelized'` which does not allow for chunking over core dimensions. You would have parallelize the function with dask yourself, e.g., with `dask.array.map_blocks`, but then you could use apply_ufunc with `dask='allowed'`.","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,305757822
https://github.com/pydata/xarray/issues/1995#issuecomment-373576583,https://api.github.com/repos/pydata/xarray/issues/1995,373576583,MDEyOklzc3VlQ29tbWVudDM3MzU3NjU4Mw==,6213168,2018-03-16T01:40:05Z,2018-03-16T01:40:05Z,MEMBER,"> For this specific problem, I think you could solve it with xarray.apply_ufunc by writing something like a gufunc that keeps the reduced axis as size 1 to apply to each chunk, and afterwards summing up along that dimension.

@shoyer could you make an example? That was my first thought but I could not figure out how to make the apply_ufunc do it.","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,305757822
https://github.com/pydata/xarray/issues/1995#issuecomment-373572878,https://api.github.com/repos/pydata/xarray/issues/1995,373572878,MDEyOklzc3VlQ29tbWVudDM3MzU3Mjg3OA==,1217238,2018-03-16T01:16:57Z,2018-03-16T01:16:57Z,MEMBER,"One way to allow chunking across `x` would be to finish up `dask.array.einsum`: https://github.com/dask/dask/issues/732

I'm reluctant to add `reduce_func` to xarray because it isn't clear to me exactly what the underlying abstraction is. It's something like a gufunc, but does a little bit more. Also, ideally we'd like this to be in dask.array, maybe as part of `dask.array.apply_gufunc` (https://github.com/dask/dask/pull/3109).

For this specific problem, I *think* you could solve it with `xarray.apply_ufunc` by writing something like a gufunc that keeps the reduced axis as size 1 to apply to each chunk, and afterwards summing up along that dimension.","{""total_count"": 1, ""+1"": 1, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,305757822
https://github.com/pydata/xarray/issues/1995#issuecomment-373569674,https://api.github.com/repos/pydata/xarray/issues/1995,373569674,MDEyOklzc3VlQ29tbWVudDM3MzU2OTY3NA==,6815844,2018-03-16T00:57:21Z,2018-03-16T00:57:21Z,MEMBER,"If `a.dims=('x', 'y', 'z')` and `b.dims=('x', 'y', 'w')`, then we can't use `tensordot`, as we need to multiply along dimension `y`.
Maybe we can use matmul in some limited case, but generally no.","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,305757822
https://github.com/pydata/xarray/issues/1995#issuecomment-373569090,https://api.github.com/repos/pydata/xarray/issues/1995,373569090,MDEyOklzc3VlQ29tbWVudDM3MzU2OTA5MA==,1217238,2018-03-16T00:53:34Z,2018-03-16T00:53:34Z,MEMBER,"For two inputs, don't we use dask.array.tensordot?","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,305757822
https://github.com/pydata/xarray/issues/1995#issuecomment-373568992,https://api.github.com/repos/pydata/xarray/issues/1995,373568992,MDEyOklzc3VlQ29tbWVudDM3MzU2ODk5Mg==,6815844,2018-03-16T00:52:57Z,2018-03-16T00:52:57Z,MEMBER,"I think if `a` and `b` have common dimensions other than `x`, even `xarray.dot()` does not allow chunking along `x` (because it internally uses `apply_ufunc` with `dask=parallerized`).

I think it would be nice if we could have a way to allow chunking along `input_core_dims`, though I do not yet imagine how it should look like.","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,305757822
https://github.com/pydata/xarray/issues/1995#issuecomment-373568240,https://api.github.com/repos/pydata/xarray/issues/1995,373568240,MDEyOklzc3VlQ29tbWVudDM3MzU2ODI0MA==,1217238,2018-03-16T00:48:12Z,2018-03-16T00:48:12Z,MEMBER,Have you tried the new `xarray.dot()`? That might be even faster for this case.,"{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,305757822