id,node_id,number,title,user,state,locked,assignee,milestone,comments,created_at,updated_at,closed_at,author_association,active_lock_reason,draft,pull_request,body,reactions,performed_via_github_app,state_reason,repo,type
305757822,MDU6SXNzdWUzMDU3NTc4MjI=,1995,apply_ufunc support for chunks on input_core_dims,6213168,open,0,,,13,2018-03-15T23:50:22Z,2021-05-17T18:59:18Z,,MEMBER,,,,"I am trying to optimize the following function:

    c = (a * b).sum('x', skipna=False)

where a and b are xarray.DataArray's, both with dimension x and both with dask backend.

I successfully obtained a 5.5x speedup with the following:

    @numba.guvectorize(['void(float64[:], float64[:], float64[:])'], '(n),(n)->()', nopython=True, cache=True)
    def mulsum(a, b, res):
        acc = 0
        for i in range(a.size):
            acc += a[i] * b[i]
        res.flat[0] = acc

    c = xarray.apply_ufunc(
        mulsum, a, b,
        input_core_dims=[['x'], ['x']],
        dask='parallelized', output_dtypes=[float])

The problem is that this introduces a (quite problematic, in my case) constraint that a and b can't be chunked on dimension x - which is theoretically avoidable as long as the kernel function doesn't need interaction between x[i] and x[j] (e.g. it can't work for an interpolator, which would require to rely on dask ghosting).

# Proposal 
Add a parameter to apply_ufunc, ``reduce_func=None``. reduce_func is a function which takes as input two parameters a, b that are the output of func. apply_ufunc will invoke it whenever there's chunking on an input_core_dim.

e.g. my use case above would simply become:

    c = xarray.apply_ufunc(
        mulsum, a, b,
        input_core_dims=[['x'], ['x']],
        dask='parallelized', output_dtypes=[float], reduce_func=operator.sum)

So if I have 2 chunks in a and b on dimension x, apply_ufunc will internally do

    c1 = mulsum(a1, b1)
    c2 = mulsum(a2, b2)
    c = operator.sum(c1, c2)

Note that reduce_func will be invoked exclusively in presence of dask='parallelized' and when there's chunking on one or more of the input_core_dims. If reduce_func is left to None, apply_ufunc will keep crashing like it does now.
","{""url"": ""https://api.github.com/repos/pydata/xarray/issues/1995/reactions"", ""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,,13221727,issue
523438384,MDExOlB1bGxSZXF1ZXN0MzQxNDQyMTI4,3537,Numpy 1.18 support,6213168,closed,0,,,13,2019-11-15T12:17:32Z,2019-11-19T14:06:50Z,2019-11-19T14:06:46Z,MEMBER,,0,pydata/xarray/pulls/3537,"Fix mean() and nanmean() for datetime64 arrays on numpy backend when upgrading from numpy 1.17 to 1.18.
All other nan-reductions on datetime64s were broken before and remain broken.
mean() on datetime64 and dask was broken before and remains broken.

 - [x] Closes #3409
 - [x] Passes `black . && mypy . && flake8`
 - [x] Fully documented, including `whats-new.rst` for all changes and `api.rst` for new API
","{""url"": ""https://api.github.com/repos/pydata/xarray/issues/3537/reactions"", ""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,,13221727,pull
297631403,MDExOlB1bGxSZXF1ZXN0MTY5NTEyMjU1,1915,h5netcdf new API support,6213168,closed,0,,,13,2018-02-15T23:15:55Z,2018-05-11T23:49:00Z,2018-05-08T02:25:40Z,MEMBER,,0,pydata/xarray/pulls/1915,"Closes #1536

Support arbitrary compression plugins through the h5netcdf new API.

Done:
- public API and docstrings (untested)
- implementation
- unit tests
- What's New","{""url"": ""https://api.github.com/repos/pydata/xarray/issues/1915/reactions"", ""total_count"": 1, ""+1"": 1, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,,13221727,pull