issues: 305757822
This data as json
id | node_id | number | title | user | state | locked | assignee | milestone | comments | created_at | updated_at | closed_at | author_association | active_lock_reason | draft | pull_request | body | reactions | performed_via_github_app | state_reason | repo | type |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
305757822 | MDU6SXNzdWUzMDU3NTc4MjI= | 1995 | apply_ufunc support for chunks on input_core_dims | 6213168 | open | 0 | 13 | 2018-03-15T23:50:22Z | 2021-05-17T18:59:18Z | MEMBER | I am trying to optimize the following function:
where a and b are xarray.DataArray's, both with dimension x and both with dask backend. I successfully obtained a 5.5x speedup with the following:
The problem is that this introduces a (quite problematic, in my case) constraint that a and b can't be chunked on dimension x - which is theoretically avoidable as long as the kernel function doesn't need interaction between x[i] and x[j] (e.g. it can't work for an interpolator, which would require to rely on dask ghosting). ProposalAdd a parameter to apply_ufunc, e.g. my use case above would simply become:
So if I have 2 chunks in a and b on dimension x, apply_ufunc will internally do
Note that reduce_func will be invoked exclusively in presence of dask='parallelized' and when there's chunking on one or more of the input_core_dims. If reduce_func is left to None, apply_ufunc will keep crashing like it does now. |
{ "url": "https://api.github.com/repos/pydata/xarray/issues/1995/reactions", "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
13221727 | issue |