home / github

Menu
  • Search all tables
  • GraphQL API

issue_comments

Table actions
  • GraphQL API for issue_comments

5 rows where issue = 528701910 and user = 941907 sorted by updated_at descending

✎ View and edit SQL

This data as json, CSV (advanced)

Suggested facets: reactions, created_at (date), updated_at (date)

user 1

  • smartass101 · 5 ✖

issue 1

  • apply_ufunc with dask='parallelized' and vectorize=True fails on compute_meta · 5 ✖

author_association 1

  • NONE 5
id html_url issue_url node_id user created_at updated_at ▲ author_association body reactions performed_via_github_app issue
567082163 https://github.com/pydata/xarray/issues/3574#issuecomment-567082163 https://api.github.com/repos/pydata/xarray/issues/3574 MDEyOklzc3VlQ29tbWVudDU2NzA4MjE2Mw== smartass101 941907 2019-12-18T15:32:38Z 2019-12-18T15:32:38Z NONE

meta = np.ndarray if vectorize is True else None if the user doesn't explicitly provide meta.

Yes, sorry, written this way I now see what you meant and that will likely work indeed.

{
    "total_count": 1,
    "+1": 1,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  apply_ufunc with dask='parallelized' and vectorize=True fails on compute_meta 528701910
566938638 https://github.com/pydata/xarray/issues/3574#issuecomment-566938638 https://api.github.com/repos/pydata/xarray/issues/3574 MDEyOklzc3VlQ29tbWVudDU2NjkzODYzOA== smartass101 941907 2019-12-18T08:55:29Z 2019-12-18T08:55:29Z NONE

meta should be passed to blockwise through _apply_blockwise with default None (I think) and np.ndarray if vectorize is True. You'll have to pass the vectorize kwarg down to this level I think.

I'm afraid that passing meta=None will not help as explained in https://github.com/dask/dask/issues/5642 and seen around this line because in that case compute_meta will be called which might fail with a np.vectorize-wrapped function. I belive a better solution would be to address https://github.com/dask/dask/issues/5642 so that meta isn't computed even though we already provide an output dtype.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  apply_ufunc with dask='parallelized' and vectorize=True fails on compute_meta 528701910
565186199 https://github.com/pydata/xarray/issues/3574#issuecomment-565186199 https://api.github.com/repos/pydata/xarray/issues/3574 MDEyOklzc3VlQ29tbWVudDU2NTE4NjE5OQ== smartass101 941907 2019-12-12T21:04:33Z 2019-12-12T21:04:33Z NONE

The problem is that Dask, as of version 2.0, calls functions applied to dask arrays with size zero inputs, to figure out the output array type, e.g., is the output a dense numpy.ndarray or a sparse array?

Yes, now I recall that this was the issue, yeah. It doesn't even depend on your actual data really.

Possible option 3. is to address https://github.com/dask/dask/issues/5642 directly (haven't found time to do a PR yet). Essentially from the code described in that issue I have the feeling that if a dtype is passed (as apply_ufunc does), then meta should not need to be calculated.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  apply_ufunc with dask='parallelized' and vectorize=True fails on compute_meta 528701910
564934693 https://github.com/pydata/xarray/issues/3574#issuecomment-564934693 https://api.github.com/repos/pydata/xarray/issues/3574 MDEyOklzc3VlQ29tbWVudDU2NDkzNDY5Mw== smartass101 941907 2019-12-12T09:57:18Z 2019-12-12T09:57:28Z NONE

Sounds similar. But I'm not sure why you get the 0d issue when even your chunks don't (from a quick reading) seem to have a 0 size in any of the dimensions. Could you please show us what is the resulting chunk setup?

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  apply_ufunc with dask='parallelized' and vectorize=True fails on compute_meta 528701910
558616375 https://github.com/pydata/xarray/issues/3574#issuecomment-558616375 https://api.github.com/repos/pydata/xarray/issues/3574 MDEyOklzc3VlQ29tbWVudDU1ODYxNjM3NQ== smartass101 941907 2019-11-26T12:56:47Z 2019-11-26T12:56:47Z NONE

Another approach would be to bypass compute_meta in dask.blockwise if dtype is provided which seems to be hinted at here

https://github.com/dask/dask/blob/3960c6518318f2417658c2fc47cd5b5ece726f8b/dask/array/blockwise.py#L234

Perhaps this is an oversight in dask, what do you think?

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  apply_ufunc with dask='parallelized' and vectorize=True fails on compute_meta 528701910

Advanced export

JSON shape: default, array, newline-delimited, object

CSV options:

CREATE TABLE [issue_comments] (
   [html_url] TEXT,
   [issue_url] TEXT,
   [id] INTEGER PRIMARY KEY,
   [node_id] TEXT,
   [user] INTEGER REFERENCES [users]([id]),
   [created_at] TEXT,
   [updated_at] TEXT,
   [author_association] TEXT,
   [body] TEXT,
   [reactions] TEXT,
   [performed_via_github_app] TEXT,
   [issue] INTEGER REFERENCES [issues]([id])
);
CREATE INDEX [idx_issue_comments_issue]
    ON [issue_comments] ([issue]);
CREATE INDEX [idx_issue_comments_user]
    ON [issue_comments] ([user]);
Powered by Datasette · Queries took 3120.784ms · About: xarray-datasette