html_url,issue_url,id,node_id,user,created_at,updated_at,author_association,body,reactions,performed_via_github_app,issue https://github.com/pydata/xarray/issues/3574#issuecomment-567082163,https://api.github.com/repos/pydata/xarray/issues/3574,567082163,MDEyOklzc3VlQ29tbWVudDU2NzA4MjE2Mw==,941907,2019-12-18T15:32:38Z,2019-12-18T15:32:38Z,NONE,"> `meta = np.ndarray if vectorize is True else None` if the user doesn't explicitly provide `meta`. Yes, sorry, written this way I now see what you meant and that will likely work indeed. ","{""total_count"": 1, ""+1"": 1, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,528701910 https://github.com/pydata/xarray/issues/3574#issuecomment-567077240,https://api.github.com/repos/pydata/xarray/issues/3574,567077240,MDEyOklzc3VlQ29tbWVudDU2NzA3NzI0MA==,2448579,2019-12-18T15:21:19Z,2019-12-18T15:21:19Z,MEMBER,Right the xarray solution is to set `meta = np.ndarray if vectorize is True else None` if the user doesn't explicitly provide `meta`. Or am I missing something? ,"{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,528701910 https://github.com/pydata/xarray/issues/3574#issuecomment-566938638,https://api.github.com/repos/pydata/xarray/issues/3574,566938638,MDEyOklzc3VlQ29tbWVudDU2NjkzODYzOA==,941907,2019-12-18T08:55:29Z,2019-12-18T08:55:29Z,NONE,"> `meta` should be passed to `blockwise` through `_apply_blockwise` with default `None` (I think) and `np.ndarray` if `vectorize is True`. You'll have to pass the `vectorize` kwarg down to this level I think. I'm afraid that passing `meta=None` will not help as explained in https://github.com/dask/dask/issues/5642 and seen around [this line](https://github.com/dask/dask/blob/3960c6518318f2417658c2fc47cd5b5ece726f8b/dask/array/blockwise.py#L230) because in that case `compute_meta` will be called which might fail with a `np.vectorize`-wrapped function. I belive a better solution would be to address https://github.com/dask/dask/issues/5642 so that meta isn't computed even though we already provide an output `dtype`. ","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,528701910 https://github.com/pydata/xarray/issues/3574#issuecomment-566640524,https://api.github.com/repos/pydata/xarray/issues/3574,566640524,MDEyOklzc3VlQ29tbWVudDU2NjY0MDUyNA==,2448579,2019-12-17T16:29:35Z,2019-12-17T16:29:35Z,MEMBER,"`meta` should be passed to `blockwise` through `_apply_blockwise` with default `None` (I think) and `np.ndarray` if `vectorize is True`. You'll have to pass the `vectorize` kwarg down to this level I think. https://github.com/pydata/xarray/blob/6ad59b93f814b48053b1a9eea61d7c43517105cb/xarray/core/computation.py#L579-L593","{""total_count"": 1, ""+1"": 1, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,528701910 https://github.com/pydata/xarray/issues/3574#issuecomment-566637471,https://api.github.com/repos/pydata/xarray/issues/3574,566637471,MDEyOklzc3VlQ29tbWVudDU2NjYzNzQ3MQ==,14314623,2019-12-17T16:22:35Z,2019-12-17T16:22:35Z,CONTRIBUTOR,"I can give it a shot if you could point me to the appropriate place, since I have never messed with the dask internals of xarray. ","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,528701910 https://github.com/pydata/xarray/issues/3574#issuecomment-565194778,https://api.github.com/repos/pydata/xarray/issues/3574,565194778,MDEyOklzc3VlQ29tbWVudDU2NTE5NDc3OA==,2448579,2019-12-12T21:28:39Z,2019-12-12T21:28:39Z,MEMBER,@shoyer's option 1 should be a relatively simple xarray PR is one of you is up for it.,"{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,528701910 https://github.com/pydata/xarray/issues/3574#issuecomment-565186199,https://api.github.com/repos/pydata/xarray/issues/3574,565186199,MDEyOklzc3VlQ29tbWVudDU2NTE4NjE5OQ==,941907,2019-12-12T21:04:33Z,2019-12-12T21:04:33Z,NONE,"> The problem is that Dask, as of version 2.0, calls functions applied to dask arrays with size zero inputs, to figure out the output array type, e.g., is the output a dense numpy.ndarray or a sparse array? Yes, now I recall that this was the issue, yeah. It doesn't even depend on your actual data really. Possible option 3. is to address https://github.com/dask/dask/issues/5642 directly (haven't found time to do a PR yet). Essentially from the code described in that issue I have the feeling that if a `dtype` is passed (as `apply_ufunc` does), then `meta` should not need to be calculated.","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,528701910 https://github.com/pydata/xarray/issues/3574#issuecomment-565107345,https://api.github.com/repos/pydata/xarray/issues/3574,565107345,MDEyOklzc3VlQ29tbWVudDU2NTEwNzM0NQ==,1217238,2019-12-12T17:33:43Z,2019-12-12T17:33:43Z,MEMBER,"The problem is that Dask, as of version 2.0, calls functions applied to dask arrays with size zero inputs, to figure out the output array type, e.g., is the output a dense numpy.ndarray or a sparse array? Unfortunately, `numpy.vectorize` doesn't know how to large of a size 0 array to make, because it doesn't have anything like the `output_sizes` argument. For xarray, we have a couple of options: 1. we can safely assume that if the applied function is a `np.vectorize`, then it should pass `meta=np.ndarray` into the relevant dask functions (e.g., `dask.array.blockwise`). This should avoid the need to evaluate with size 0 arrays. 1. we could add an `output_sizes` argument to `np.vectorize` either upstream in NumPy or into a wrapper in Xarray. (1) is probably easiest here.","{""total_count"": 1, ""+1"": 1, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,528701910 https://github.com/pydata/xarray/issues/3574#issuecomment-565057853,https://api.github.com/repos/pydata/xarray/issues/3574,565057853,MDEyOklzc3VlQ29tbWVudDU2NTA1Nzg1Mw==,14314623,2019-12-12T15:35:10Z,2019-12-12T15:35:10Z,CONTRIBUTOR,"This is the chunk setup ![image](https://user-images.githubusercontent.com/14314623/70725792-053c5400-1ccb-11ea-9833-07fc61061c99.png) Might this be a problem resulting from `numpy.vectorize`?","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,528701910 https://github.com/pydata/xarray/issues/3574#issuecomment-564934693,https://api.github.com/repos/pydata/xarray/issues/3574,564934693,MDEyOklzc3VlQ29tbWVudDU2NDkzNDY5Mw==,941907,2019-12-12T09:57:18Z,2019-12-12T09:57:28Z,NONE,Sounds similar. But I'm not sure why you get the 0d issue when even your chunks don't (from a quick reading) seem to have a 0 size in any of the dimensions. Could you please show us what is the resulting chunk setup?,"{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,528701910 https://github.com/pydata/xarray/issues/3574#issuecomment-564843368,https://api.github.com/repos/pydata/xarray/issues/3574,564843368,MDEyOklzc3VlQ29tbWVudDU2NDg0MzM2OA==,14314623,2019-12-12T04:22:02Z,2019-12-12T05:32:14Z,CONTRIBUTOR,"I am having a similar problem. This impacts some of my [frequently used code to compute correlations](https://github.com/jbusecke/xarrayutils/blob/7b09a2bdc70f035e290e75419c2d025b7267adf4/xarrayutils/utils.py#L52). Here is a simplified example that used to work with older dependencies: ``` import xarray as xr import numpy as np from scipy.stats import linregress def _ufunc(aa,bb): out = linregress(aa,bb) return np.array([out.slope, out.intercept]) def wrapper(a, b, dim='time'): return xr.apply_ufunc( _ufunc,a,b, input_core_dims=[[dim], [dim]], output_core_dims=[[""parameter""]], vectorize=True, dask=""parallelized"", output_dtypes=[a.dtype], output_sizes={""parameter"": 2},) ``` This works when passing numpy arrays: ``` a = xr.DataArray(np.random.rand(3, 13, 5), dims=['x', 'time', 'y']) b = xr.DataArray(np.random.rand(3, 5, 13), dims=['x','y', 'time']) wrapper(a,b) ```
array([[[ 0.09958247, 0.36831431], [-0.54445474, 0.66997513], [-0.22894182, 0.65433402], [ 0.38536482, 0.20656073], [ 0.25083224, 0.46955618]], [[-0.21684891, 0.55521932], [ 0.51621616, 0.20869272], [-0.1502755 , 0.55526262], [-0.25452988, 0.60823538], [-0.20571622, 0.56950115]], [[-0.22810421, 0.50423622], [ 0.33002345, 0.36121484], [ 0.37744774, 0.33081058], [-0.10825559, 0.53772493], [-0.12576656, 0.51722167]]]) Dimensions without coordinates: x, y, parameter
But when I convert both arrays to dask arrays, I get the same error as @smartass101. ``` wrapper(a.chunk({'x':2, 'time':-1}),b.chunk({'x':2, 'time':-1})) ```
--------------------------------------------------------------------------- ValueError Traceback (most recent call last) in 1 a = xr.DataArray(np.random.rand(3, 13, 5), dims=['x', 'time', 'y']) 2 b = xr.DataArray(np.random.rand(3, 5, 13), dims=['x','y', 'time']) ----> 3 wrapper(a.chunk({'x':2, 'time':-1}),b.chunk({'x':2, 'time':-1})) in wrapper(a, b, dim) 16 dask=""parallelized"", 17 output_dtypes=[a.dtype], ---> 18 output_sizes={""parameter"": 2},) ~/miniconda/envs/euc_dynamics/lib/python3.7/site-packages/xarray/core/computation.py in apply_ufunc(func, input_core_dims, output_core_dims, exclude_dims, vectorize, join, dataset_join, dataset_fill_value, keep_attrs, kwargs, dask, output_dtypes, output_sizes, *args) 1042 join=join, 1043 exclude_dims=exclude_dims, -> 1044 keep_attrs=keep_attrs 1045 ) 1046 elif any(isinstance(a, Variable) for a in args): ~/miniconda/envs/euc_dynamics/lib/python3.7/site-packages/xarray/core/computation.py in apply_dataarray_vfunc(func, signature, join, exclude_dims, keep_attrs, *args) 232 233 data_vars = [getattr(a, ""variable"", a) for a in args] --> 234 result_var = func(*data_vars) 235 236 if signature.num_outputs > 1: ~/miniconda/envs/euc_dynamics/lib/python3.7/site-packages/xarray/core/computation.py in apply_variable_ufunc(func, signature, exclude_dims, dask, output_dtypes, output_sizes, keep_attrs, *args) 601 ""apply_ufunc: {}"".format(dask) 602 ) --> 603 result_data = func(*input_data) 604 605 if signature.num_outputs == 1: ~/miniconda/envs/euc_dynamics/lib/python3.7/site-packages/xarray/core/computation.py in func(*arrays) 591 signature, 592 output_dtypes, --> 593 output_sizes, 594 ) 595 ~/miniconda/envs/euc_dynamics/lib/python3.7/site-packages/xarray/core/computation.py in _apply_blockwise(func, args, input_dims, output_dims, signature, output_dtypes, output_sizes) 721 dtype=dtype, 722 concatenate=True, --> 723 new_axes=output_sizes 724 ) 725 ~/miniconda/envs/euc_dynamics/lib/python3.7/site-packages/dask/array/blockwise.py in blockwise(func, out_ind, name, token, dtype, adjust_chunks, new_axes, align_arrays, concatenate, meta, *args, **kwargs) 231 from .utils import compute_meta 232 --> 233 meta = compute_meta(func, dtype, *args[::2], **kwargs) 234 if meta is not None: 235 return Array(graph, out, chunks, meta=meta) ~/miniconda/envs/euc_dynamics/lib/python3.7/site-packages/dask/array/utils.py in compute_meta(func, _dtype, *args, **kwargs) 119 # with np.vectorize, such as dask.array.routines._isnonzero_vec(). 120 if isinstance(func, np.vectorize): --> 121 meta = func(*args_meta) 122 else: 123 try: ~/miniconda/envs/euc_dynamics/lib/python3.7/site-packages/numpy/lib/function_base.py in __call__(self, *args, **kwargs) 2089 vargs.extend([kwargs[_n] for _n in names]) 2090 -> 2091 return self._vectorize_call(func=func, args=vargs) 2092 2093 def _get_ufunc_and_otypes(self, func, args): ~/miniconda/envs/euc_dynamics/lib/python3.7/site-packages/numpy/lib/function_base.py in _vectorize_call(self, func, args) 2155 """"""Vectorized call to `func` over positional `args`."""""" 2156 if self.signature is not None: -> 2157 res = self._vectorize_call_with_signature(func, args) 2158 elif not args: 2159 res = func() ~/miniconda/envs/euc_dynamics/lib/python3.7/site-packages/numpy/lib/function_base.py in _vectorize_call_with_signature(self, func, args) 2229 for dims in output_core_dims 2230 for dim in dims): -> 2231 raise ValueError('cannot call `vectorize` with a signature ' 2232 'including new output dimensions on size 0 ' 2233 'inputs') ValueError: cannot call `vectorize` with a signature including new output dimensions on size 0 inputs
This used to work like a charm...I however was sloppy in testing this functionality (a good reminder always to write tests immediately 🙄 ), and I was not able to determine a combination of dependencies that would work. I am still experimenting and will report back Could this behaviour be a bug introduced in dask at some point (as indicated by @smartass101 above)? cc'ing @dcherian @shoyer @mrocklin EDIT: I can confirm that it seems to be a dask issue. If I restrict my dask version to `<2.0`, my tests (very similar to the above example) work.","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,528701910 https://github.com/pydata/xarray/issues/3574#issuecomment-558616375,https://api.github.com/repos/pydata/xarray/issues/3574,558616375,MDEyOklzc3VlQ29tbWVudDU1ODYxNjM3NQ==,941907,2019-11-26T12:56:47Z,2019-11-26T12:56:47Z,NONE,"Another approach would be to bypass `compute_meta` in `dask.blockwise` if `dtype` is provided which seems to be hinted at here https://github.com/dask/dask/blob/3960c6518318f2417658c2fc47cd5b5ece726f8b/dask/array/blockwise.py#L234 Perhaps this is an oversight in `dask`, what do you think?","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,528701910