home / github

Menu
  • GraphQL API
  • Search all tables

issue_comments

Table actions
  • GraphQL API for issue_comments

12 rows where issue = 528701910 sorted by updated_at descending

✎ View and edit SQL

This data as json, CSV (advanced)

Suggested facets: reactions, created_at (date), updated_at (date)

user 4

  • smartass101 5
  • dcherian 3
  • jbusecke 3
  • shoyer 1

author_association 3

  • NONE 5
  • MEMBER 4
  • CONTRIBUTOR 3

issue 1

  • apply_ufunc with dask='parallelized' and vectorize=True fails on compute_meta · 12 ✖
id html_url issue_url node_id user created_at updated_at ▲ author_association body reactions performed_via_github_app issue
567082163 https://github.com/pydata/xarray/issues/3574#issuecomment-567082163 https://api.github.com/repos/pydata/xarray/issues/3574 MDEyOklzc3VlQ29tbWVudDU2NzA4MjE2Mw== smartass101 941907 2019-12-18T15:32:38Z 2019-12-18T15:32:38Z NONE

meta = np.ndarray if vectorize is True else None if the user doesn't explicitly provide meta.

Yes, sorry, written this way I now see what you meant and that will likely work indeed.

{
    "total_count": 1,
    "+1": 1,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  apply_ufunc with dask='parallelized' and vectorize=True fails on compute_meta 528701910
567077240 https://github.com/pydata/xarray/issues/3574#issuecomment-567077240 https://api.github.com/repos/pydata/xarray/issues/3574 MDEyOklzc3VlQ29tbWVudDU2NzA3NzI0MA== dcherian 2448579 2019-12-18T15:21:19Z 2019-12-18T15:21:19Z MEMBER

Right the xarray solution is to set meta = np.ndarray if vectorize is True else None if the user doesn't explicitly provide meta. Or am I missing something?

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  apply_ufunc with dask='parallelized' and vectorize=True fails on compute_meta 528701910
566938638 https://github.com/pydata/xarray/issues/3574#issuecomment-566938638 https://api.github.com/repos/pydata/xarray/issues/3574 MDEyOklzc3VlQ29tbWVudDU2NjkzODYzOA== smartass101 941907 2019-12-18T08:55:29Z 2019-12-18T08:55:29Z NONE

meta should be passed to blockwise through _apply_blockwise with default None (I think) and np.ndarray if vectorize is True. You'll have to pass the vectorize kwarg down to this level I think.

I'm afraid that passing meta=None will not help as explained in https://github.com/dask/dask/issues/5642 and seen around this line because in that case compute_meta will be called which might fail with a np.vectorize-wrapped function. I belive a better solution would be to address https://github.com/dask/dask/issues/5642 so that meta isn't computed even though we already provide an output dtype.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  apply_ufunc with dask='parallelized' and vectorize=True fails on compute_meta 528701910
566640524 https://github.com/pydata/xarray/issues/3574#issuecomment-566640524 https://api.github.com/repos/pydata/xarray/issues/3574 MDEyOklzc3VlQ29tbWVudDU2NjY0MDUyNA== dcherian 2448579 2019-12-17T16:29:35Z 2019-12-17T16:29:35Z MEMBER

meta should be passed to blockwise through _apply_blockwise with default None (I think) and np.ndarray if vectorize is True. You'll have to pass the vectorize kwarg down to this level I think.

https://github.com/pydata/xarray/blob/6ad59b93f814b48053b1a9eea61d7c43517105cb/xarray/core/computation.py#L579-L593

{
    "total_count": 1,
    "+1": 1,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  apply_ufunc with dask='parallelized' and vectorize=True fails on compute_meta 528701910
566637471 https://github.com/pydata/xarray/issues/3574#issuecomment-566637471 https://api.github.com/repos/pydata/xarray/issues/3574 MDEyOklzc3VlQ29tbWVudDU2NjYzNzQ3MQ== jbusecke 14314623 2019-12-17T16:22:35Z 2019-12-17T16:22:35Z CONTRIBUTOR

I can give it a shot if you could point me to the appropriate place, since I have never messed with the dask internals of xarray.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  apply_ufunc with dask='parallelized' and vectorize=True fails on compute_meta 528701910
565194778 https://github.com/pydata/xarray/issues/3574#issuecomment-565194778 https://api.github.com/repos/pydata/xarray/issues/3574 MDEyOklzc3VlQ29tbWVudDU2NTE5NDc3OA== dcherian 2448579 2019-12-12T21:28:39Z 2019-12-12T21:28:39Z MEMBER

@shoyer's option 1 should be a relatively simple xarray PR is one of you is up for it.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  apply_ufunc with dask='parallelized' and vectorize=True fails on compute_meta 528701910
565186199 https://github.com/pydata/xarray/issues/3574#issuecomment-565186199 https://api.github.com/repos/pydata/xarray/issues/3574 MDEyOklzc3VlQ29tbWVudDU2NTE4NjE5OQ== smartass101 941907 2019-12-12T21:04:33Z 2019-12-12T21:04:33Z NONE

The problem is that Dask, as of version 2.0, calls functions applied to dask arrays with size zero inputs, to figure out the output array type, e.g., is the output a dense numpy.ndarray or a sparse array?

Yes, now I recall that this was the issue, yeah. It doesn't even depend on your actual data really.

Possible option 3. is to address https://github.com/dask/dask/issues/5642 directly (haven't found time to do a PR yet). Essentially from the code described in that issue I have the feeling that if a dtype is passed (as apply_ufunc does), then meta should not need to be calculated.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  apply_ufunc with dask='parallelized' and vectorize=True fails on compute_meta 528701910
565107345 https://github.com/pydata/xarray/issues/3574#issuecomment-565107345 https://api.github.com/repos/pydata/xarray/issues/3574 MDEyOklzc3VlQ29tbWVudDU2NTEwNzM0NQ== shoyer 1217238 2019-12-12T17:33:43Z 2019-12-12T17:33:43Z MEMBER

The problem is that Dask, as of version 2.0, calls functions applied to dask arrays with size zero inputs, to figure out the output array type, e.g., is the output a dense numpy.ndarray or a sparse array?

Unfortunately, numpy.vectorize doesn't know how to large of a size 0 array to make, because it doesn't have anything like the output_sizes argument.

For xarray, we have a couple of options: 1. we can safely assume that if the applied function is a np.vectorize, then it should pass meta=np.ndarray into the relevant dask functions (e.g., dask.array.blockwise). This should avoid the need to evaluate with size 0 arrays. 1. we could add an output_sizes argument to np.vectorize either upstream in NumPy or into a wrapper in Xarray.

(1) is probably easiest here.

{
    "total_count": 1,
    "+1": 1,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  apply_ufunc with dask='parallelized' and vectorize=True fails on compute_meta 528701910
565057853 https://github.com/pydata/xarray/issues/3574#issuecomment-565057853 https://api.github.com/repos/pydata/xarray/issues/3574 MDEyOklzc3VlQ29tbWVudDU2NTA1Nzg1Mw== jbusecke 14314623 2019-12-12T15:35:10Z 2019-12-12T15:35:10Z CONTRIBUTOR

This is the chunk setup

Might this be a problem resulting from numpy.vectorize?

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  apply_ufunc with dask='parallelized' and vectorize=True fails on compute_meta 528701910
564934693 https://github.com/pydata/xarray/issues/3574#issuecomment-564934693 https://api.github.com/repos/pydata/xarray/issues/3574 MDEyOklzc3VlQ29tbWVudDU2NDkzNDY5Mw== smartass101 941907 2019-12-12T09:57:18Z 2019-12-12T09:57:28Z NONE

Sounds similar. But I'm not sure why you get the 0d issue when even your chunks don't (from a quick reading) seem to have a 0 size in any of the dimensions. Could you please show us what is the resulting chunk setup?

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  apply_ufunc with dask='parallelized' and vectorize=True fails on compute_meta 528701910
564843368 https://github.com/pydata/xarray/issues/3574#issuecomment-564843368 https://api.github.com/repos/pydata/xarray/issues/3574 MDEyOklzc3VlQ29tbWVudDU2NDg0MzM2OA== jbusecke 14314623 2019-12-12T04:22:02Z 2019-12-12T05:32:14Z CONTRIBUTOR

I am having a similar problem. This impacts some of my frequently used code to compute correlations.

Here is a simplified example that used to work with older dependencies: ``` import xarray as xr import numpy as np from scipy.stats import linregress

def _ufunc(aa,bb): out = linregress(aa,bb) return np.array([out.slope, out.intercept])

def wrapper(a, b, dim='time'): return xr.apply_ufunc( _ufunc,a,b, input_core_dims=[[dim], [dim]], output_core_dims=[["parameter"]], vectorize=True, dask="parallelized", output_dtypes=[a.dtype], output_sizes={"parameter": 2},) ```

This works when passing numpy arrays:

a = xr.DataArray(np.random.rand(3, 13, 5), dims=['x', 'time', 'y']) b = xr.DataArray(np.random.rand(3, 5, 13), dims=['x','y', 'time']) wrapper(a,b)

<xarray.DataArray (x: 3, y: 5, parameter: 2)> array([[[ 0.09958247, 0.36831431], [-0.54445474, 0.66997513], [-0.22894182, 0.65433402], [ 0.38536482, 0.20656073], [ 0.25083224, 0.46955618]], [[-0.21684891, 0.55521932], [ 0.51621616, 0.20869272], [-0.1502755 , 0.55526262], [-0.25452988, 0.60823538], [-0.20571622, 0.56950115]], [[-0.22810421, 0.50423622], [ 0.33002345, 0.36121484], [ 0.37744774, 0.33081058], [-0.10825559, 0.53772493], [-0.12576656, 0.51722167]]]) Dimensions without coordinates: x, y, parameter

But when I convert both arrays to dask arrays, I get the same error as @smartass101.

wrapper(a.chunk({'x':2, 'time':-1}),b.chunk({'x':2, 'time':-1}))

--------------------------------------------------------------------------- ValueError Traceback (most recent call last) <ipython-input-4-303b400356e2> in <module> 1 a = xr.DataArray(np.random.rand(3, 13, 5), dims=['x', 'time', 'y']) 2 b = xr.DataArray(np.random.rand(3, 5, 13), dims=['x','y', 'time']) ----> 3 wrapper(a.chunk({'x':2, 'time':-1}),b.chunk({'x':2, 'time':-1})) <ipython-input-1-4094fd485c95> in wrapper(a, b, dim) 16 dask="parallelized", 17 output_dtypes=[a.dtype], ---> 18 output_sizes={"parameter": 2},) ~/miniconda/envs/euc_dynamics/lib/python3.7/site-packages/xarray/core/computation.py in apply_ufunc(func, input_core_dims, output_core_dims, exclude_dims, vectorize, join, dataset_join, dataset_fill_value, keep_attrs, kwargs, dask, output_dtypes, output_sizes, *args) 1042 join=join, 1043 exclude_dims=exclude_dims, -> 1044 keep_attrs=keep_attrs 1045 ) 1046 elif any(isinstance(a, Variable) for a in args): ~/miniconda/envs/euc_dynamics/lib/python3.7/site-packages/xarray/core/computation.py in apply_dataarray_vfunc(func, signature, join, exclude_dims, keep_attrs, *args) 232 233 data_vars = [getattr(a, "variable", a) for a in args] --> 234 result_var = func(*data_vars) 235 236 if signature.num_outputs > 1: ~/miniconda/envs/euc_dynamics/lib/python3.7/site-packages/xarray/core/computation.py in apply_variable_ufunc(func, signature, exclude_dims, dask, output_dtypes, output_sizes, keep_attrs, *args) 601 "apply_ufunc: {}".format(dask) 602 ) --> 603 result_data = func(*input_data) 604 605 if signature.num_outputs == 1: ~/miniconda/envs/euc_dynamics/lib/python3.7/site-packages/xarray/core/computation.py in func(*arrays) 591 signature, 592 output_dtypes, --> 593 output_sizes, 594 ) 595 ~/miniconda/envs/euc_dynamics/lib/python3.7/site-packages/xarray/core/computation.py in _apply_blockwise(func, args, input_dims, output_dims, signature, output_dtypes, output_sizes) 721 dtype=dtype, 722 concatenate=True, --> 723 new_axes=output_sizes 724 ) 725 ~/miniconda/envs/euc_dynamics/lib/python3.7/site-packages/dask/array/blockwise.py in blockwise(func, out_ind, name, token, dtype, adjust_chunks, new_axes, align_arrays, concatenate, meta, *args, **kwargs) 231 from .utils import compute_meta 232 --> 233 meta = compute_meta(func, dtype, *args[::2], **kwargs) 234 if meta is not None: 235 return Array(graph, out, chunks, meta=meta) ~/miniconda/envs/euc_dynamics/lib/python3.7/site-packages/dask/array/utils.py in compute_meta(func, _dtype, *args, **kwargs) 119 # with np.vectorize, such as dask.array.routines._isnonzero_vec(). 120 if isinstance(func, np.vectorize): --> 121 meta = func(*args_meta) 122 else: 123 try: ~/miniconda/envs/euc_dynamics/lib/python3.7/site-packages/numpy/lib/function_base.py in __call__(self, *args, **kwargs) 2089 vargs.extend([kwargs[_n] for _n in names]) 2090 -> 2091 return self._vectorize_call(func=func, args=vargs) 2092 2093 def _get_ufunc_and_otypes(self, func, args): ~/miniconda/envs/euc_dynamics/lib/python3.7/site-packages/numpy/lib/function_base.py in _vectorize_call(self, func, args) 2155 """Vectorized call to `func` over positional `args`.""" 2156 if self.signature is not None: -> 2157 res = self._vectorize_call_with_signature(func, args) 2158 elif not args: 2159 res = func() ~/miniconda/envs/euc_dynamics/lib/python3.7/site-packages/numpy/lib/function_base.py in _vectorize_call_with_signature(self, func, args) 2229 for dims in output_core_dims 2230 for dim in dims): -> 2231 raise ValueError('cannot call `vectorize` with a signature ' 2232 'including new output dimensions on size 0 ' 2233 'inputs') ValueError: cannot call `vectorize` with a signature including new output dimensions on size 0 inputs

This used to work like a charm...I however was sloppy in testing this functionality (a good reminder always to write tests immediately 🙄 ), and I was not able to determine a combination of dependencies that would work. I am still experimenting and will report back

Could this behaviour be a bug introduced in dask at some point (as indicated by @smartass101 above)? cc'ing @dcherian @shoyer @mrocklin

EDIT: I can confirm that it seems to be a dask issue. If I restrict my dask version to <2.0, my tests (very similar to the above example) work.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  apply_ufunc with dask='parallelized' and vectorize=True fails on compute_meta 528701910
558616375 https://github.com/pydata/xarray/issues/3574#issuecomment-558616375 https://api.github.com/repos/pydata/xarray/issues/3574 MDEyOklzc3VlQ29tbWVudDU1ODYxNjM3NQ== smartass101 941907 2019-11-26T12:56:47Z 2019-11-26T12:56:47Z NONE

Another approach would be to bypass compute_meta in dask.blockwise if dtype is provided which seems to be hinted at here

https://github.com/dask/dask/blob/3960c6518318f2417658c2fc47cd5b5ece726f8b/dask/array/blockwise.py#L234

Perhaps this is an oversight in dask, what do you think?

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  apply_ufunc with dask='parallelized' and vectorize=True fails on compute_meta 528701910

Advanced export

JSON shape: default, array, newline-delimited, object

CSV options:

CREATE TABLE [issue_comments] (
   [html_url] TEXT,
   [issue_url] TEXT,
   [id] INTEGER PRIMARY KEY,
   [node_id] TEXT,
   [user] INTEGER REFERENCES [users]([id]),
   [created_at] TEXT,
   [updated_at] TEXT,
   [author_association] TEXT,
   [body] TEXT,
   [reactions] TEXT,
   [performed_via_github_app] TEXT,
   [issue] INTEGER REFERENCES [issues]([id])
);
CREATE INDEX [idx_issue_comments_issue]
    ON [issue_comments] ([issue]);
CREATE INDEX [idx_issue_comments_user]
    ON [issue_comments] ([user]);
Powered by Datasette · Queries took 13.437ms · About: xarray-datasette