issues: 305757822
This data as json
| id | node_id | number | title | user | state | locked | assignee | milestone | comments | created_at | updated_at | closed_at | author_association | active_lock_reason | draft | pull_request | body | reactions | performed_via_github_app | state_reason | repo | type |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 305757822 | MDU6SXNzdWUzMDU3NTc4MjI= | 1995 | apply_ufunc support for chunks on input_core_dims | 6213168 | open | 0 | 13 | 2018-03-15T23:50:22Z | 2021-05-17T18:59:18Z | MEMBER | I am trying to optimize the following function:
where a and b are xarray.DataArray's, both with dimension x and both with dask backend. I successfully obtained a 5.5x speedup with the following:
The problem is that this introduces a (quite problematic, in my case) constraint that a and b can't be chunked on dimension x - which is theoretically avoidable as long as the kernel function doesn't need interaction between x[i] and x[j] (e.g. it can't work for an interpolator, which would require to rely on dask ghosting). ProposalAdd a parameter to apply_ufunc, e.g. my use case above would simply become:
So if I have 2 chunks in a and b on dimension x, apply_ufunc will internally do
Note that reduce_func will be invoked exclusively in presence of dask='parallelized' and when there's chunking on one or more of the input_core_dims. If reduce_func is left to None, apply_ufunc will keep crashing like it does now. |
{
"url": "https://api.github.com/repos/pydata/xarray/issues/1995/reactions",
"total_count": 0,
"+1": 0,
"-1": 0,
"laugh": 0,
"hooray": 0,
"confused": 0,
"heart": 0,
"rocket": 0,
"eyes": 0
} |
13221727 | issue |