issue_comments: 373870013
This data as json
html_url | issue_url | id | node_id | user | created_at | updated_at | author_association | body | reactions | performed_via_github_app | issue |
---|---|---|---|---|---|---|---|---|---|---|---|
https://github.com/pydata/xarray/issues/1995#issuecomment-373870013 | https://api.github.com/repos/pydata/xarray/issues/1995 | 373870013 | MDEyOklzc3VlQ29tbWVudDM3Mzg3MDAxMw== | 6213168 | 2018-03-16T23:19:19Z | 2018-03-29T09:57:14Z | MEMBER | [EDIT] drastically simplified chunking algorithm @shoyer , close, but your version doesn't work in case of broadcasting. I think I fixed it although it won't work correctly if only one between a or b has dask backend, and I'm not sure how to fix it: ```python import xarray import numpy import dask.array coefficients = xarray.DataArray( dask.array.random.random((106, 99), chunks=(25, 25)), dims=['formula', 'time']) components = xarray.DataArray( dask.array.random.random((106, 512 * 1024), chunks=(25, 65536)), dims=['formula', 'scenario']) def mulsum(a, b, dim): return xarray.apply_ufunc( _mulsum_xarray_kernel, a, b, input_core_dims=[[dim], [dim]], dask='allowed', output_dtypes=[float]) def _mulsum_xarray_kernel(a, b): if isinstance(a, dask.array.Array) and isinstance(b, dask.array.Array): chunks = dask.array.core.broadcast_chunks(a.chunks, b.chunks) chunks = chunks[:-1] + (tuple(1 for _ in chunks[-1]), )
def _mulsum_dask_kernel(a, b): a = numpy.ascontiguousarray(a) b = numpy.ascontiguousarray(b) res = numpy.einsum('...i,...i', a, b, optimize='optimal') return res[..., numpy.newaxis] mulsum(coefficients, components, dim='formula') ``` Proposal 2Modify apply_ufunc: * remove the check that the input_core_dims must not be chunked * add parameter output_chunks My initial example would become: ```python def mulsum_kernel(a, b): return numpy.einsum('...i,...i', a, b)[..., numpy.newaxis] c = xarray.apply_ufunc( mulsum_kernel, a, b, dask='parallelized', input_core_dims=[['x'], ['x']], output_dtypes=[float], output_core_dims=[['__partial']], output_chunks={'__partial': [1 for _ in a.chunks[a.dims.index('x')]} ).sum('__partial') ``` Although I'm not sure this approach would be univocous when there's more than one core_dim... |
{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
305757822 |