home / github

Menu
  • GraphQL API
  • Search all tables

issue_comments

Table actions
  • GraphQL API for issue_comments

63 rows where issue = 618081836 sorted by updated_at descending

✎ View and edit SQL

This data as json, CSV (advanced)

Suggested facets: reactions, created_at (date), updated_at (date)

user 6

  • kmuehlbauer 37
  • mathause 17
  • shoyer 5
  • dcherian 2
  • keewis 1
  • pep8speaks 1

author_association 2

  • MEMBER 62
  • NONE 1

issue 1

  • ENH: use `dask.array.apply_gufunc` in `xr.apply_ufunc` · 63 ✖
id html_url issue_url node_id user created_at updated_at ▲ author_association body reactions performed_via_github_app issue
675973700 https://github.com/pydata/xarray/pull/4060#issuecomment-675973700 https://api.github.com/repos/pydata/xarray/issues/4060 MDEyOklzc3VlQ29tbWVudDY3NTk3MzcwMA== kmuehlbauer 5821660 2020-08-19T08:52:40Z 2020-08-19T08:52:40Z MEMBER

Thanks to all reviewers! Great job!

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  ENH: use `dask.array.apply_gufunc` in `xr.apply_ufunc`  618081836
675890055 https://github.com/pydata/xarray/pull/4060#issuecomment-675890055 https://api.github.com/repos/pydata/xarray/issues/4060 MDEyOklzc3VlQ29tbWVudDY3NTg5MDA1NQ== mathause 10194086 2020-08-19T06:57:29Z 2020-08-19T06:57:29Z MEMBER

ok then - let's do this. Thanks a lot @kmuehlbauer

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  ENH: use `dask.array.apply_gufunc` in `xr.apply_ufunc`  618081836
631290739 https://github.com/pydata/xarray/pull/4060#issuecomment-631290739 https://api.github.com/repos/pydata/xarray/issues/4060 MDEyOklzc3VlQ29tbWVudDYzMTI5MDczOQ== pep8speaks 24736507 2020-05-20T07:28:08Z 2020-08-19T05:41:15Z NONE

Hello @kmuehlbauer! Thanks for updating this PR. We checked the lines you've touched for PEP 8 issues, and found:

There are currently no PEP 8 issues detected in this Pull Request. Cheers! :beers:

Comment last updated at 2020-08-19 05:41:15 UTC
{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  ENH: use `dask.array.apply_gufunc` in `xr.apply_ufunc`  618081836
675356719 https://github.com/pydata/xarray/pull/4060#issuecomment-675356719 https://api.github.com/repos/pydata/xarray/issues/4060 MDEyOklzc3VlQ29tbWVudDY3NTM1NjcxOQ== mathause 10194086 2020-08-18T09:02:58Z 2020-08-18T09:02:58Z MEMBER

Nice! Unless @dcherian has any additional comments I'll merge in a few days

{
    "total_count": 1,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 1,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  ENH: use `dask.array.apply_gufunc` in `xr.apply_ufunc`  618081836
675269117 https://github.com/pydata/xarray/pull/4060#issuecomment-675269117 https://api.github.com/repos/pydata/xarray/issues/4060 MDEyOklzc3VlQ29tbWVudDY3NTI2OTExNw== kmuehlbauer 5821660 2020-08-18T05:55:06Z 2020-08-18T05:55:06Z MEMBER

@mathause I've merged latest master into this PR to hopefully get all tests green. The former had some problems with a conda error in MinimumVersions job.

Please let me know, if there is anything for me to do, to get this merged.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  ENH: use `dask.array.apply_gufunc` in `xr.apply_ufunc`  618081836
674823266 https://github.com/pydata/xarray/pull/4060#issuecomment-674823266 https://api.github.com/repos/pydata/xarray/issues/4060 MDEyOklzc3VlQ29tbWVudDY3NDgyMzI2Ng== kmuehlbauer 5821660 2020-08-17T11:21:38Z 2020-08-17T11:21:38Z MEMBER

While having a last review I've found another small glitch. I'll come back the next days to see, if anything needs to be done from reviewers side.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  ENH: use `dask.array.apply_gufunc` in `xr.apply_ufunc`  618081836
674805321 https://github.com/pydata/xarray/pull/4060#issuecomment-674805321 https://api.github.com/repos/pydata/xarray/issues/4060 MDEyOklzc3VlQ29tbWVudDY3NDgwNTMyMQ== keewis 14808389 2020-08-17T10:40:42Z 2020-08-17T10:44:37Z MEMBER

I think all of these can be done in a new PR, we just have to make sure to include them in the next release (which might need to be soon so we regain compatibility with the most recent pandas).

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  ENH: use `dask.array.apply_gufunc` in `xr.apply_ufunc`  618081836
674806601 https://github.com/pydata/xarray/pull/4060#issuecomment-674806601 https://api.github.com/repos/pydata/xarray/issues/4060 MDEyOklzc3VlQ29tbWVudDY3NDgwNjYwMQ== kmuehlbauer 5821660 2020-08-17T10:43:26Z 2020-08-17T10:43:26Z MEMBER

Great, than it looks like it's finally done. :smiley:

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  ENH: use `dask.array.apply_gufunc` in `xr.apply_ufunc`  618081836
674803101 https://github.com/pydata/xarray/pull/4060#issuecomment-674803101 https://api.github.com/repos/pydata/xarray/issues/4060 MDEyOklzc3VlQ29tbWVudDY3NDgwMzEwMQ== kmuehlbauer 5821660 2020-08-17T10:35:16Z 2020-08-17T10:35:16Z MEMBER

@keewis @mathause Thanks for the review. I've added a checklist in the first post with "open issues" with this PR, which might be solved in a follow up PR. Would be good to know which need to go in here, so I can add this.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  ENH: use `dask.array.apply_gufunc` in `xr.apply_ufunc`  618081836
674753706 https://github.com/pydata/xarray/pull/4060#issuecomment-674753706 https://api.github.com/repos/pydata/xarray/issues/4060 MDEyOklzc3VlQ29tbWVudDY3NDc1MzcwNg== kmuehlbauer 5821660 2020-08-17T08:56:30Z 2020-08-17T08:56:30Z MEMBER

@mathause I've merged with master ( + resolved conflicts) and addressed the latest review comments by @dcherian.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  ENH: use `dask.array.apply_gufunc` in `xr.apply_ufunc`  618081836
674160208 https://github.com/pydata/xarray/pull/4060#issuecomment-674160208 https://api.github.com/repos/pydata/xarray/issues/4060 MDEyOklzc3VlQ29tbWVudDY3NDE2MDIwOA== kmuehlbauer 5821660 2020-08-14T16:38:12Z 2020-08-14T16:38:12Z MEMBER

@dcherian Thanks for your latest review comments. I'll try to come up with a solution after merging with latest master (and resolving conflicts).

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  ENH: use `dask.array.apply_gufunc` in `xr.apply_ufunc`  618081836
669283654 https://github.com/pydata/xarray/pull/4060#issuecomment-669283654 https://api.github.com/repos/pydata/xarray/issues/4060 MDEyOklzc3VlQ29tbWVudDY2OTI4MzY1NA== mathause 10194086 2020-08-05T16:06:59Z 2020-08-05T16:06:59Z MEMBER

gentle ping @shoyer @dcherian I think this is not too far away

{
    "total_count": 1,
    "+1": 1,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  ENH: use `dask.array.apply_gufunc` in `xr.apply_ufunc`  618081836
660874936 https://github.com/pydata/xarray/pull/4060#issuecomment-660874936 https://api.github.com/repos/pydata/xarray/issues/4060 MDEyOklzc3VlQ29tbWVudDY2MDg3NDkzNg== mathause 10194086 2020-07-20T08:11:37Z 2020-07-20T08:11:37Z MEMBER

I had another look and seems fine to me. So let's ping @shoyer & @dcherian for a review.

{
    "total_count": 1,
    "+1": 1,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  ENH: use `dask.array.apply_gufunc` in `xr.apply_ufunc`  618081836
659197260 https://github.com/pydata/xarray/pull/4060#issuecomment-659197260 https://api.github.com/repos/pydata/xarray/issues/4060 MDEyOklzc3VlQ29tbWVudDY1OTE5NzI2MA== kmuehlbauer 5821660 2020-07-16T06:52:57Z 2020-07-16T06:52:57Z MEMBER

How should we proceed with this PR?

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  ENH: use `dask.array.apply_gufunc` in `xr.apply_ufunc`  618081836
651826366 https://github.com/pydata/xarray/pull/4060#issuecomment-651826366 https://api.github.com/repos/pydata/xarray/issues/4060 MDEyOklzc3VlQ29tbWVudDY1MTgyNjM2Ng== kmuehlbauer 5821660 2020-06-30T14:24:39Z 2020-06-30T14:24:39Z MEMBER

@mathause This should now raise for both type of 'output_sizes' errors, wrong dim name in 'output_sizes' and missing dim name in 'output_sizes'. I'll catch up with any review tomorrow morning CET.

{
    "total_count": 1,
    "+1": 1,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  ENH: use `dask.array.apply_gufunc` in `xr.apply_ufunc`  618081836
651802870 https://github.com/pydata/xarray/pull/4060#issuecomment-651802870 https://api.github.com/repos/pydata/xarray/issues/4060 MDEyOklzc3VlQ29tbWVudDY1MTgwMjg3MA== mathause 10194086 2020-06-30T13:49:05Z 2020-06-30T13:49:05Z MEMBER

Could you use signature.all_output_core_dims ?

{
    "total_count": 1,
    "+1": 1,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  ENH: use `dask.array.apply_gufunc` in `xr.apply_ufunc`  618081836
651800312 https://github.com/pydata/xarray/pull/4060#issuecomment-651800312 https://api.github.com/repos/pydata/xarray/issues/4060 MDEyOklzc3VlQ29tbWVudDY1MTgwMDMxMg== kmuehlbauer 5821660 2020-06-30T13:44:45Z 2020-06-30T13:44:45Z MEMBER

python if key not in signature.output_core_dims:

This does not work, because signature.output_core_dims is a list of tuples. So this would raise in any case. But I've no idea how to properly extract this into a single list/tuple...

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  ENH: use `dask.array.apply_gufunc` in `xr.apply_ufunc`  618081836
651796020 https://github.com/pydata/xarray/pull/4060#issuecomment-651796020 https://api.github.com/repos/pydata/xarray/issues/4060 MDEyOklzc3VlQ29tbWVudDY1MTc5NjAyMA== mathause 10194086 2020-06-30T13:37:19Z 2020-06-30T13:37:19Z MEMBER

You could go for something like this:

```python output_sizes_renamed = dict() for key, value in output_sizes.items():

if key not in signature.output_core_dims:
    raise ValueError("dimension name in 'output_sizes' must correspond to output_core_dims")
output_sizes_renamed[signature.dims_map[key]] = value

```

Instead of python output_sizes = { signature.dims_map[key]: value for key, value in output_sizes.items() }

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  ENH: use `dask.array.apply_gufunc` in `xr.apply_ufunc`  618081836
651779418 https://github.com/pydata/xarray/pull/4060#issuecomment-651779418 https://api.github.com/repos/pydata/xarray/issues/4060 MDEyOklzc3VlQ29tbWVudDY1MTc3OTQxOA== kmuehlbauer 5821660 2020-06-30T13:08:42Z 2020-06-30T13:08:42Z MEMBER

@mathause It seems I can't find a nice way of checking/catching errors.

For now, this fails with KeyError inside xarray when output_sizes contains a wrong dimension name and this fails with KeyError inside dask, but with the not helpful dim0 message. Maybe this can be fixed in a follow up PR?

I would also leave combining of output_core_dims and output_sizes for another PR.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  ENH: use `dask.array.apply_gufunc` in `xr.apply_ufunc`  618081836
651704371 https://github.com/pydata/xarray/pull/4060#issuecomment-651704371 https://api.github.com/repos/pydata/xarray/issues/4060 MDEyOklzc3VlQ29tbWVudDY1MTcwNDM3MQ== mathause 10194086 2020-06-30T10:21:53Z 2020-06-30T10:21:53Z MEMBER

Ok, yes I am happy with erroring ourselves.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  ENH: use `dask.array.apply_gufunc` in `xr.apply_ufunc`  618081836
651700823 https://github.com/pydata/xarray/pull/4060#issuecomment-651700823 https://api.github.com/repos/pydata/xarray/issues/4060 MDEyOklzc3VlQ29tbWVudDY1MTcwMDgyMw== kmuehlbauer 5821660 2020-06-30T10:14:53Z 2020-06-30T10:14:53Z MEMBER

@mathause I've tested your suggestion but I do not know, what better error message you expect from the dask side.

  • xr.apply_ufunc(..., output_core_dims=[["z"]], output_sizes={"z": 1}) signature.to_gufunc_string() will output ()->(dim0) (and output_sizes will be {'dim0': 1}
  • xr.apply_ufunc(..., output_core_dims=[["z"]], output_sizes={"y": 1}) signature.to_gufunc_string() will output ()->(dim0) (and output_sizes will be {'y': 1}

So the second will fail with KeyError: 'dim0'.

I think we can't get proper error messages from dask side, because dask doesn't know anything about the new dimension name ( at least not from signature.to_gufunc_string()). Should we catch the error and add a meaningful output?

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  ENH: use `dask.array.apply_gufunc` in `xr.apply_ufunc`  618081836
651664318 https://github.com/pydata/xarray/pull/4060#issuecomment-651664318 https://api.github.com/repos/pydata/xarray/issues/4060 MDEyOklzc3VlQ29tbWVudDY1MTY2NDMxOA== mathause 10194086 2020-06-30T09:06:40Z 2020-06-30T09:06:40Z MEMBER

Yes, I think this got lost in a resolved comment.

output_core_dims=[{"abc": 2}] looks very elegant. But it is passed to _UFuncSignature(input_core_dims, output_core_dims) so you would probably need to take it apart again in the code (or adapt _UFuncSignature to handle this case). Not sure how involved this is. Could potentially be left for a future PR?

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  ENH: use `dask.array.apply_gufunc` in `xr.apply_ufunc`  618081836
651653334 https://github.com/pydata/xarray/pull/4060#issuecomment-651653334 https://api.github.com/repos/pydata/xarray/issues/4060 MDEyOklzc3VlQ29tbWVudDY1MTY1MzMzNA== kmuehlbauer 5821660 2020-06-30T08:49:10Z 2020-06-30T08:49:10Z MEMBER

I made one more suggestion to catch the case

python xr.apply_ufunc(..., output_core_dims=[["abc"]], output_sizes={"xyz": 2})

Yes, I've got the email, but did not find it here. Thanks! Would it not be useful to combine output_core_dims and output_sizes, eg.

python xr.apply_ufunc(..., output_core_dims=[{"abc": 2]])

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  ENH: use `dask.array.apply_gufunc` in `xr.apply_ufunc`  618081836
651606271 https://github.com/pydata/xarray/pull/4060#issuecomment-651606271 https://api.github.com/repos/pydata/xarray/issues/4060 MDEyOklzc3VlQ29tbWVudDY1MTYwNjI3MQ== kmuehlbauer 5821660 2020-06-30T07:32:42Z 2020-06-30T07:32:42Z MEMBER

@mathause @shoyer @dcherian @max-sixty

This PR has gone a long way from first enabling multiple outputs to implementation of dask.apply_gufunc. Thanks for all your help, comments and suggestions. As xarray 0.16.0 is handled in the next days (or so I understood from reading the comments here #4031), I'm asking myself, if this can be finalized for 0.16.0?

I appreciate if we could work on any remaining problems to get this PR merged.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  ENH: use `dask.array.apply_gufunc` in `xr.apply_ufunc`  618081836
643162218 https://github.com/pydata/xarray/pull/4060#issuecomment-643162218 https://api.github.com/repos/pydata/xarray/issues/4060 MDEyOklzc3VlQ29tbWVudDY0MzE2MjIxOA== kmuehlbauer 5821660 2020-06-12T09:02:40Z 2020-06-12T09:02:40Z MEMBER

@dcherian @shoyer Only the upstream-dev is failing with the sparse-tests (#4146). This is ready for another review cycle. Please let me know, if there is more to work on this PR.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  ENH: use `dask.array.apply_gufunc` in `xr.apply_ufunc`  618081836
641098064 https://github.com/pydata/xarray/pull/4060#issuecomment-641098064 https://api.github.com/repos/pydata/xarray/issues/4060 MDEyOklzc3VlQ29tbWVudDY0MTA5ODA2NA== kmuehlbauer 5821660 2020-06-09T07:43:55Z 2020-06-09T07:43:55Z MEMBER

@mathause I reverted the mentioned error catching. I created a dedicated test for testing the dask vectorize without given dtype. Hope that fits your requirements.

{
    "total_count": 1,
    "+1": 1,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  ENH: use `dask.array.apply_gufunc` in `xr.apply_ufunc`  618081836
641056080 https://github.com/pydata/xarray/pull/4060#issuecomment-641056080 https://api.github.com/repos/pydata/xarray/issues/4060 MDEyOklzc3VlQ29tbWVudDY0MTA1NjA4MA== kmuehlbauer 5821660 2020-06-09T06:17:11Z 2020-06-09T06:17:11Z MEMBER

@mathause Sorry for the delay. I have currently no other idea how to circumvent the use of dask_gufunc_kwargs also for the non-dask vectorize. I'm happy to adapt the code to a more convenient and readable solution.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  ENH: use `dask.array.apply_gufunc` in `xr.apply_ufunc`  618081836
635337996 https://github.com/pydata/xarray/pull/4060#issuecomment-635337996 https://api.github.com/repos/pydata/xarray/issues/4060 MDEyOklzc3VlQ29tbWVudDYzNTMzNzk5Ng== kmuehlbauer 5821660 2020-05-28T13:06:16Z 2020-05-28T13:06:16Z MEMBER

This is ready for another review cycle. :smiley:

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  ENH: use `dask.array.apply_gufunc` in `xr.apply_ufunc`  618081836
635293190 https://github.com/pydata/xarray/pull/4060#issuecomment-635293190 https://api.github.com/repos/pydata/xarray/issues/4060 MDEyOklzc3VlQ29tbWVudDYzNTI5MzE5MA== kmuehlbauer 5821660 2020-05-28T11:55:03Z 2020-05-28T11:55:03Z MEMBER

@mathause Thanks, I had just approached that particular line while debugging, when your message came in :grin:. I'll add an xfail for that dask version first.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  ENH: use `dask.array.apply_gufunc` in `xr.apply_ufunc`  618081836
635273244 https://github.com/pydata/xarray/pull/4060#issuecomment-635273244 https://api.github.com/repos/pydata/xarray/issues/4060 MDEyOklzc3VlQ29tbWVudDYzNTI3MzI0NA== mathause 10194086 2020-05-28T11:06:15Z 2020-05-28T11:22:18Z MEMBER

Could be "Avoid adjusting gufunc’s meta dtype twice" in https://docs.dask.org/en/latest/changelog.html#id84

Not sure if it is better to xfail or bump the version of dask (which could be done ipython ci/min_deps_check.py ci/requirements/py36-min-all-deps.yml)

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  ENH: use `dask.array.apply_gufunc` in `xr.apply_ufunc`  618081836
635237621 https://github.com/pydata/xarray/pull/4060#issuecomment-635237621 https://api.github.com/repos/pydata/xarray/issues/4060 MDEyOklzc3VlQ29tbWVudDYzNTIzNzYyMQ== kmuehlbauer 5821660 2020-05-28T09:46:32Z 2020-05-28T09:46:32Z MEMBER

@mathause Any idea why the meta test fails in Linux py36-min-all-deps?

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  ENH: use `dask.array.apply_gufunc` in `xr.apply_ufunc`  618081836
635226708 https://github.com/pydata/xarray/pull/4060#issuecomment-635226708 https://api.github.com/repos/pydata/xarray/issues/4060 MDEyOklzc3VlQ29tbWVudDYzNTIyNjcwOA== kmuehlbauer 5821660 2020-05-28T09:24:37Z 2020-05-28T09:24:37Z MEMBER

@mathause I've added your tests from #4022 to this PR (dtype/meta).

{
    "total_count": 1,
    "+1": 1,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  ENH: use `dask.array.apply_gufunc` in `xr.apply_ufunc`  618081836
635218745 https://github.com/pydata/xarray/pull/4060#issuecomment-635218745 https://api.github.com/repos/pydata/xarray/issues/4060 MDEyOklzc3VlQ29tbWVudDYzNTIxODc0NQ== kmuehlbauer 5821660 2020-05-28T09:08:39Z 2020-05-28T09:08:39Z MEMBER

OK, vectorization has to be inside apply_variable_ufunc to be applied properly on dask and non-dask configurations.

For the time being I use dask_gufunc_kwargs to distribute vectorize and output_dtypes into apply_variable_ufunc. Maybe we have to decide on more sensible name for that dict.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  ENH: use `dask.array.apply_gufunc` in `xr.apply_ufunc`  618081836
635196460 https://github.com/pydata/xarray/pull/4060#issuecomment-635196460 https://api.github.com/repos/pydata/xarray/issues/4060 MDEyOklzc3VlQ29tbWVudDYzNTE5NjQ2MA== kmuehlbauer 5821660 2020-05-28T08:25:38Z 2020-05-28T08:25:38Z MEMBER

Currently tests are failing for some test_missing. Any hints appreciated.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  ENH: use `dask.array.apply_gufunc` in `xr.apply_ufunc`  618081836
635194699 https://github.com/pydata/xarray/pull/4060#issuecomment-635194699 https://api.github.com/repos/pydata/xarray/issues/4060 MDEyOklzc3VlQ29tbWVudDYzNTE5NDY5OQ== kmuehlbauer 5821660 2020-05-28T08:22:13Z 2020-05-28T08:22:13Z MEMBER

Thanks @shoyer and @mathause. I've tried to adapt the PR to your comments.

  • The meta and output_sizes parameters should be and the vectorize and output_dtypes parameters can be given within the dask_gufunc_kwargs (which have priority). This would also be the place for allow_rechunk. Everything is mentioned in the docstring and warnings are issued to use the new dask_gufunc_kwargs. The dask_gufunc_kwargs itself is used to hold all keyword arguments for dask.array.apply_gufunc.

  • vectorization is done inside apply_ufunc only for dask not parallelized. In that case it is deferred to dask.array.apply_gufunc.

  • for chunking problems (core dimension with multiple chunks and non-core dimensions with different chunksizes) we let dask raise the errors.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  ENH: use `dask.array.apply_gufunc` in `xr.apply_ufunc`  618081836
634776667 https://github.com/pydata/xarray/pull/4060#issuecomment-634776667 https://api.github.com/repos/pydata/xarray/issues/4060 MDEyOklzc3VlQ29tbWVudDYzNDc3NjY2Nw== shoyer 1217238 2020-05-27T16:18:24Z 2020-05-27T16:18:24Z MEMBER
  • un-equal chunking along non-core dimensions

The second is not so trivial. I see three possibilities (1) just error, (2) try dask.array.apply_gufunc if that fails issue a warning and use the old apply_blockwise (3) figure out ourselves if non-core dimensions (called loop dimensions in dask) are not-equally chunked, issue a warning and re-chunk them ourselves. Maybe @shoyer and @dcherian can weight in here.

I am pretty confident that existing behavior of xarray.apply_ufunc with un-equal chunks along non-core dimensions is entirely broken. I am OK with just erroring for now.

{
    "total_count": 3,
    "+1": 2,
    "-1": 0,
    "laugh": 1,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  ENH: use `dask.array.apply_gufunc` in `xr.apply_ufunc`  618081836
634774468 https://github.com/pydata/xarray/pull/4060#issuecomment-634774468 https://api.github.com/repos/pydata/xarray/issues/4060 MDEyOklzc3VlQ29tbWVudDYzNDc3NDQ2OA== shoyer 1217238 2020-05-27T16:14:33Z 2020-05-27T16:14:33Z MEMBER

Looking at the internals of dask.array.apply_gufunc, it looks like it passes on **kwargs to blockwise. blockwise gets called twice, first with only **kwargs and then with both meta and **kwargs: https://github.com/dask/dask/blob/77628f2d5248cc61cc366cac3e400b6df5c654c1/dask/array/gufunc.py#L422 https://github.com/dask/dask/blob/77628f2d5248cc61cc366cac3e400b6df5c654c1/dask/array/gufunc.py#L439

So it seems we can actually still pass on a meta argument explicitly, at least in the first case.

Here's my suggestion for moving forwarding:

  • If meta was explicitly set (meta is not None), pass it into apply_gufunc in **kwargs
  • Otherwise, omit meta.
{
    "total_count": 1,
    "+1": 1,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  ENH: use `dask.array.apply_gufunc` in `xr.apply_ufunc`  618081836
634587531 https://github.com/pydata/xarray/pull/4060#issuecomment-634587531 https://api.github.com/repos/pydata/xarray/issues/4060 MDEyOklzc3VlQ29tbWVudDYzNDU4NzUzMQ== mathause 10194086 2020-05-27T11:03:12Z 2020-05-27T11:03:12Z MEMBER

No need to worry about the second one:

https://github.com/pydata/xarray/blob/e5cc19cd8f8a69e0743f230f5bf51b7778a0ff96/xarray/core/computation.py#L704

{
    "total_count": 1,
    "+1": 1,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  ENH: use `dask.array.apply_gufunc` in `xr.apply_ufunc`  618081836
634583266 https://github.com/pydata/xarray/pull/4060#issuecomment-634583266 https://api.github.com/repos/pydata/xarray/issues/4060 MDEyOklzc3VlQ29tbWVudDYzNDU4MzI2Ng== kmuehlbauer 5821660 2020-05-27T10:53:25Z 2020-05-27T10:53:25Z MEMBER

What I was trying to say - this is a change that can break existing code and it is not easy to do a proper deprecation cycle (i.e. only throw a warning). (I think allow_rechunk=True is not a good default.)

Yes, sure, I didn't think about the effects on existing code. We might need to capture both ValueErrors and handle differently. I'll try to come up with a proposal for this.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  ENH: use `dask.array.apply_gufunc` in `xr.apply_ufunc`  618081836
634572185 https://github.com/pydata/xarray/pull/4060#issuecomment-634572185 https://api.github.com/repos/pydata/xarray/issues/4060 MDEyOklzc3VlQ29tbWVudDYzNDU3MjE4NQ== mathause 10194086 2020-05-27T10:28:26Z 2020-05-27T10:28:26Z MEMBER

output_dtypes: yes I think the docs should be updated

What I was trying to say - this is a change that can break existing code and it is not easy to do a proper deprecation cycle (i.e. only throw a warning). (I think allow_rechunk=True is not a good default.)


Ok an easier idea (could be brittle, though):

```python try: res = da.apply_gufunc(...)

catch error that was not raised in earlier versions of xr.apply_ufunc

except ValueError as e: if "with different chunksize present" in str(e): warnings.warn(str(e) + " This will throw an error in the future.") res = da.apply_gufunc(..., allow_rechunk=True) else: raise

``` However, I am only throwing around ideas - this is certainly not up to me. Let's see what the others think. Maybe they have an opinion or a better idea.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  ENH: use `dask.array.apply_gufunc` in `xr.apply_ufunc`  618081836
634497758 https://github.com/pydata/xarray/pull/4060#issuecomment-634497758 https://api.github.com/repos/pydata/xarray/issues/4060 MDEyOklzc3VlQ29tbWVudDYzNDQ5Nzc1OA== kmuehlbauer 5821660 2020-05-27T08:02:24Z 2020-05-27T08:02:24Z MEMBER

@mathause Thanks for the elaborate review. But, I'm unsure how to move on with this.

I'll reply to some of your comments below.

Note that output_dtypes is used in vectorize so this may not be a candidate for dask_gufunc_kwargs.

Hmm, the docstring indicates that it is only used with dask='parallelized'. But in the code it can be used in any case. This should be clarified.

The second is not so trivial. I see three possibilities (1) just error, (2) try dask.array.apply_gufunc if that fails issue a warning and use the old apply_blockwise (3) figure out ourselves if non-core dimensions (called loop dimensions in dask) are not-equally chunked, issue a warning and re-chunk them ourselves. Maybe @shoyer and @dcherian can weight in here.

When looking at the code of dask.array.apply_gufunc it seems, that any possible re-chunking is done in dask.array.blockwise. I also do not see any special treatment of warning or re-chunking of non-core dimensions in the current _apply_blockwise. So from my reasoning we should be safe to use (4) try dask.array.apply_gufunc since dask.array.apply_gufunc will issue the appropriate ValueError if something is wrong with the chunking. (Only thing is that the ValueError concerning the non-core dimension doesn't mention allow_rechunk). If the ValueError is issued, the user can the just set allow_rechunk=True keyword.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  ENH: use `dask.array.apply_gufunc` in `xr.apply_ufunc`  618081836
632557161 https://github.com/pydata/xarray/pull/4060#issuecomment-632557161 https://api.github.com/repos/pydata/xarray/issues/4060 MDEyOklzc3VlQ29tbWVudDYzMjU1NzE2MQ== kmuehlbauer 5821660 2020-05-22T08:03:28Z 2020-05-22T08:03:28Z MEMBER

although only two keywords seem to be worth feeding through (keepdims, allow_rechunk).

Just noticed, that output_dtypes and output_sizes could also be fed as gufunc_kwargs. So this would really simplify the calling signature of apply_ufunc if those kwargs could be handled that way.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  ENH: use `dask.array.apply_gufunc` in `xr.apply_ufunc`  618081836
632546838 https://github.com/pydata/xarray/pull/4060#issuecomment-632546838 https://api.github.com/repos/pydata/xarray/issues/4060 MDEyOklzc3VlQ29tbWVudDYzMjU0NjgzOA== kmuehlbauer 5821660 2020-05-22T07:41:49Z 2020-05-22T07:41:49Z MEMBER

I'll only be able to look at it properly next week.

@mathause I'll leave the PR unchanged and catch up with you next week.

@shoyer @dcherian Thanks for your comments. Please let me know, which tests should be added to check for any possible surprises with this change to apply_gufunc.

The problem is that we don't have any way to detect ahead of time whether the applied function already supports dask arrays (e.g., if it is built-up out of functions from dask.array). If it does, we don't want to set dask='parallelized' but rather let the function handle dask arrays itself.

(Att: no native english speaker here, so bear with me, if something sounds clunky or not exactly matching) Then we would have to keep the dask='forbidden' as default, as well as parallelized and allowed to force the decision to the user. Maybe the keyword settings itself could be a bit more clear. In the allowed-case the function in question has to natively support dask-arrays. So I would use dask='native' in that case. For the parallelized-case this PR proposes to use dask.array.apply_gufunc (generalized ufunc). So either we stick to parallelized or we try to find a better fitting name (eg. dask='gufunc').

For the keywords I think @dcherian 's proposal of something like dask_gufunc_kwargs (or gufunc_kwargs) is useful (would match with dask='gufunc'), although only two keywords seem to be worth feeding through (keepdims, allow_rechunk).

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  ENH: use `dask.array.apply_gufunc` in `xr.apply_ufunc`  618081836
631785279 https://github.com/pydata/xarray/pull/4060#issuecomment-631785279 https://api.github.com/repos/pydata/xarray/issues/4060 MDEyOklzc3VlQ29tbWVudDYzMTc4NTI3OQ== dcherian 2448579 2020-05-20T23:11:41Z 2020-05-20T23:11:41Z MEMBER

good point @mathause. Looks like apply_gufunc tries to make blockwise infer meta and then does its own thing when that fails: https://github.com/dask/dask/blob/3b92efff2e779f59e95e05af9b8f371d56227d02/dask/array/gufunc.py#L417-L440 . I don't understand how this can work in all possible cases.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  ENH: use `dask.array.apply_gufunc` in `xr.apply_ufunc`  618081836
631729847 https://github.com/pydata/xarray/pull/4060#issuecomment-631729847 https://api.github.com/repos/pydata/xarray/issues/4060 MDEyOklzc3VlQ29tbWVudDYzMTcyOTg0Nw== mathause 10194086 2020-05-20T21:13:03Z 2020-05-20T21:13:03Z MEMBER

I don't see meta listed in the docs which is why it thought it's not needed. But if it is handled in dask.array.apply_gufunc it can of course stay.

I only realised the exact distinction between "allowed" and "parallelized" today - i.e. that "parallelized" is kind of the dask equivalent of np.vectorize. I can suggest something for the docstring (e.g. prefer "allowed" if ``func`` natively handles dask arrays or so)

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  ENH: use `dask.array.apply_gufunc` in `xr.apply_ufunc`  618081836
631706236 https://github.com/pydata/xarray/pull/4060#issuecomment-631706236 https://api.github.com/repos/pydata/xarray/issues/4060 MDEyOklzc3VlQ29tbWVudDYzMTcwNjIzNg== dcherian 2448579 2020-05-20T20:24:32Z 2020-05-20T20:24:32Z MEMBER

If it does, we don't want to set dask='parallelized' but rather let the function handle dask arrays itself.

I think we still need all the current options for the dask kwarg. There can be disastrous consequences so it's good to make users explicitly choose the behaviour they want.

howto properly handle deprecation (eg. meta)

I don't think we should deprecate meta. Not all user functions can deal with zero shaped inputs, so automatically inferring meta need not always work. We've had to add a similar feature for map_blocks (#3575) so I think meta should stay.

keywords of dask.array.apply_gufunc

Shall we add a new dask_gufunc_kwargs and pass that down to appy_gufunc?

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  ENH: use `dask.array.apply_gufunc` in `xr.apply_ufunc`  618081836
631702113 https://github.com/pydata/xarray/pull/4060#issuecomment-631702113 https://api.github.com/repos/pydata/xarray/issues/4060 MDEyOklzc3VlQ29tbWVudDYzMTcwMjExMw== shoyer 1217238 2020-05-20T20:16:09Z 2020-05-20T20:16:09Z MEMBER

Maybe this is too defensive/surprising, and could be relaxed.

You would remove the daks="forbidden" branch and not the dask="parallelized"?

For the functions that don't handle dask arrays gracefully, dask="parallelized" would be the better option?

This is probably another good motivation: defaulting to dask='forbidden' forces users to make an explicit choice about whether or not use dask='parallelized'.

The problem is that we don't have any way to detect ahead of time whether the applied function already supports dask arrays (e.g., if it is built-up out of functions from dask.array). If it does, we don't want to set dask='parallelized' but rather let the function handle dask arrays itself.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  ENH: use `dask.array.apply_gufunc` in `xr.apply_ufunc`  618081836
631700340 https://github.com/pydata/xarray/pull/4060#issuecomment-631700340 https://api.github.com/repos/pydata/xarray/issues/4060 MDEyOklzc3VlQ29tbWVudDYzMTcwMDM0MA== mathause 10194086 2020-05-20T20:12:25Z 2020-05-20T20:12:25Z MEMBER

Maybe this is too defensive/surprising, and could be relaxed.

You would remove the daks="forbidden" branch and not the dask="parallelized"?

For the functions that don't handle dask arrays gracefully, dask="parallelized" would be the better option?


Very cool - good progress.

  • I guess you'll have to properly deprecate meta, something along the lines of: meta is no longer necessary and has no effect. it will be removed in a future version
  • I think it would be good to pass allow_rechunk.

I'll only be able to look at it properly next week.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  ENH: use `dask.array.apply_gufunc` in `xr.apply_ufunc`  618081836
631689889 https://github.com/pydata/xarray/pull/4060#issuecomment-631689889 https://api.github.com/repos/pydata/xarray/issues/4060 MDEyOklzc3VlQ29tbWVudDYzMTY4OTg4OQ== shoyer 1217238 2020-05-20T19:50:23Z 2020-05-20T19:50:23Z MEMBER

The original motivation for requiring dask='allowed' is that I was concerned that users would put a function that coerces its arguments into NumPy arrays into apply_ufunc (e.g., like many functions from SciPy), which could have surprisingly bad performance when called on dask arrays due to automatic coercion.

Maybe this is too defensive/surprising, and could be relaxed. We don't really have any guard-rails like this elsewhere in xarray.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  ENH: use `dask.array.apply_gufunc` in `xr.apply_ufunc`  618081836
631493099 https://github.com/pydata/xarray/pull/4060#issuecomment-631493099 https://api.github.com/repos/pydata/xarray/issues/4060 MDEyOklzc3VlQ29tbWVudDYzMTQ5MzA5OQ== kmuehlbauer 5821660 2020-05-20T14:02:07Z 2020-05-20T14:02:07Z MEMBER

@mathause All tests green, good starting point for review. Please notify other people who should have a look at this.

There are still things to discuss: - keywords of dask.array.apply_gufunc - howto properly handle deprecation (eg. meta) - doscstring - adding/revision of tests

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  ENH: use `dask.array.apply_gufunc` in `xr.apply_ufunc`  618081836
630840345 https://github.com/pydata/xarray/pull/4060#issuecomment-630840345 https://api.github.com/repos/pydata/xarray/issues/4060 MDEyOklzc3VlQ29tbWVudDYzMDg0MDM0NQ== kmuehlbauer 5821660 2020-05-19T14:03:42Z 2020-05-20T13:41:41Z MEMBER

I've given this a try, but this will need some design decisions.

  • currently vectorize is handled in any case if requested, before falling through the if/else. dask.array.apply_gufunc takes vectorize as parameter, so for dask we do not need to apply vectorization. We would need to apply vectorize only for non-dask cases (maybe just before calling the final function).
  • currently dask is only handled, if dask keyword is issued ('allowed' and parallelized). From my perspective the dask keyword is not needed any more. We could just divert to the apply_gufunc when dask backed arrays are detected.

This will really have much impact on the code/tests. I'll come up with a updated PullRequest in short time, but any thoughts /remarks whatsoever are very much appreciated.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  ENH: use `dask.array.apply_gufunc` in `xr.apply_ufunc`  618081836
631448161 https://github.com/pydata/xarray/pull/4060#issuecomment-631448161 https://api.github.com/repos/pydata/xarray/issues/4060 MDEyOklzc3VlQ29tbWVudDYzMTQ0ODE2MQ== kmuehlbauer 5821660 2020-05-20T12:40:24Z 2020-05-20T12:40:24Z MEMBER

@mathause Only one breaking test which is connected to the meta stuff you handle in #4022. Any suggestions on that topic? I've removed meta completely since it was only needed in dask.array.blockwise but not in dask.array.apply_gufunc.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  ENH: use `dask.array.apply_gufunc` in `xr.apply_ufunc`  618081836
631369862 https://github.com/pydata/xarray/pull/4060#issuecomment-631369862 https://api.github.com/repos/pydata/xarray/issues/4060 MDEyOklzc3VlQ29tbWVudDYzMTM2OTg2Mg== kmuehlbauer 5821660 2020-05-20T09:50:16Z 2020-05-20T09:50:16Z MEMBER

@mathause Great! Seems like this works better, thanks! Will update the PR after some more tests etc.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  ENH: use `dask.array.apply_gufunc` in `xr.apply_ufunc`  618081836
631353813 https://github.com/pydata/xarray/pull/4060#issuecomment-631353813 https://api.github.com/repos/pydata/xarray/issues/4060 MDEyOklzc3VlQ29tbWVudDYzMTM1MzgxMw== mathause 10194086 2020-05-20T09:20:54Z 2020-05-20T09:20:54Z MEMBER

Should you keep the dask="allowed" branch? That might solve some of the issues (instead of using allow_rechunk=True) . For example dask has it's own einsum implementation (used in xr.dot) so it may not be necessary to pipe that through dask.array.apply_gufunc (from the docstring: this function is like np.vectorize, but for the blocks of dask arrays). However, I am only speculating.

{
    "total_count": 1,
    "+1": 1,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  ENH: use `dask.array.apply_gufunc` in `xr.apply_ufunc`  618081836
631301324 https://github.com/pydata/xarray/pull/4060#issuecomment-631301324 https://api.github.com/repos/pydata/xarray/issues/4060 MDEyOklzc3VlQ29tbWVudDYzMTMwMTMyNA== kmuehlbauer 5821660 2020-05-20T07:48:01Z 2020-05-20T07:48:01Z MEMBER

From looking at the tests, it seems that setting allow_rechunk=True would solve many issues. I've no idea about the implications on memory usage (docstring: "Warning: enabling this can increase memory usage significantly"). Might apply_gufunc not be suitable for processing of eg. dot?

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  ENH: use `dask.array.apply_gufunc` in `xr.apply_ufunc`  618081836
631297367 https://github.com/pydata/xarray/pull/4060#issuecomment-631297367 https://api.github.com/repos/pydata/xarray/issues/4060 MDEyOklzc3VlQ29tbWVudDYzMTI5NzM2Nw== kmuehlbauer 5821660 2020-05-20T07:40:24Z 2020-05-20T07:40:24Z MEMBER

@mathause @shoyer

First serve of trying to use dask.array.apply_gufunc in xr.apply_ufunc. I've added a list with problems in the topmost comment of this PR, to not loose track of this. Please enhance that list, if needed.

Most problematic issue now: xr.dot doesn't work well with apply_gufunc with regard to core dimensions and chunking.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  ENH: use `dask.array.apply_gufunc` in `xr.apply_ufunc`  618081836
630846826 https://github.com/pydata/xarray/pull/4060#issuecomment-630846826 https://api.github.com/repos/pydata/xarray/issues/4060 MDEyOklzc3VlQ29tbWVudDYzMDg0NjgyNg== mathause 10194086 2020-05-19T14:13:38Z 2020-05-19T14:13:38Z MEMBER

Thanks! That would be quite cool!

dask.array.apply_gufunc does some fancy stuff with np.vectorize (to determine the output_dtypes) so I would not vectorize the function ourselves. For the other I don't have a qualified opinion.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  ENH: use `dask.array.apply_gufunc` in `xr.apply_ufunc`  618081836
629369850 https://github.com/pydata/xarray/pull/4060#issuecomment-629369850 https://api.github.com/repos/pydata/xarray/issues/4060 MDEyOklzc3VlQ29tbWVudDYyOTM2OTg1MA== shoyer 1217238 2020-05-15T16:56:04Z 2020-05-15T16:56:04Z MEMBER
  • Would it be possible to replace the call to dask.array.blockwise (for one output variable) with dask.array.apply_gufunc? Do you know why blockwise is used further below and not dask.array.apply_gufunc? I assume it's due to historical reasons but I am not sure.

AFAIK, apply_gufunc wasn't available at the time these functions were introduced. Good chance, that apply_gufunc can be used for handling single output dask too.

Exactly. It would be nice remove the use of blockwise entirely in favor of apply_gufunc.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  ENH: use `dask.array.apply_gufunc` in `xr.apply_ufunc`  618081836
629270796 https://github.com/pydata/xarray/pull/4060#issuecomment-629270796 https://api.github.com/repos/pydata/xarray/issues/4060 MDEyOklzc3VlQ29tbWVudDYyOTI3MDc5Ng== mathause 10194086 2020-05-15T14:36:56Z 2020-05-15T14:36:56Z MEMBER

Ah yes I see (https://github.com/pydata/xarray/issues/1815#issuecomment-440089606). dask.array.apply_gufunc should also be able to handle one output only.

A complete refactor of apply_ufunc would be quite some challenge.

Indeed - I think it could simplify _apply_blockwise (and might make the meta keword obsolete) but it would be good if someone with more experience of dask could weigh in.

@dcherian @shoyer

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  ENH: use `dask.array.apply_gufunc` in `xr.apply_ufunc`  618081836
629229610 https://github.com/pydata/xarray/pull/4060#issuecomment-629229610 https://api.github.com/repos/pydata/xarray/issues/4060 MDEyOklzc3VlQ29tbWVudDYyOTIyOTYxMA== kmuehlbauer 5821660 2020-05-15T13:15:19Z 2020-05-15T13:15:19Z MEMBER

Thanks @mathause for your comments and raising those questions. JFTR, I was taking the road from #1815, so my explicit use-case was the multiple (dask) outputs.

  • It might be good to add a test with a reduction and one with vectorize=True.

I'll try to add some tests for the multiple output using dask.

  • Would it be possible to replace the call to dask.array.blockwise (for one output variable) with dask.array.apply_gufunc? Do you know why blockwise is used further below and not dask.array.apply_gufunc? I assume it's due to historical reasons but I am not sure.

AFAIK, apply_gufunc wasn't available at the time these functions were introduced. Good chance, that apply_gufunc can be used for handling single output dask too.

  • dask.array.apply_gufunc does all sorts of stuff - e.g. infer meta. This could potentially solve #4015 (pull #4022) and simplify the call signature of apply_ufunc?

That's a good question. If you want me to go the long way, please be aware, that I'm a novice in xarray as well as in dask. A complete refactor of apply_ufunc would be quite some challenge.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  ENH: use `dask.array.apply_gufunc` in `xr.apply_ufunc`  618081836
629223049 https://github.com/pydata/xarray/pull/4060#issuecomment-629223049 https://api.github.com/repos/pydata/xarray/issues/4060 MDEyOklzc3VlQ29tbWVudDYyOTIyMzA0OQ== mathause 10194086 2020-05-15T13:02:03Z 2020-05-15T13:02:03Z MEMBER
  • It might be good to add a test with a reduction and one with vectorize=True.
  • Would it be possible to replace the call to dask.array.blockwise (for one output variable) with dask.array.apply_gufunc? Do you know why blockwise is used further below and not dask.array.apply_gufunc? I assume it's due to historical reasons but I am not sure.
  • dask.array.apply_gufunc does all sorts of stuff - e.g. infer meta. This could potentially solve #4015 (pull #4022) and simplify the call signature of apply_ufunc?

https://github.com/dask/dask/blob/3573b2ddca81aeb41a7def6dd4194020f853ab18/dask/array/gufunc.py#L175

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  ENH: use `dask.array.apply_gufunc` in `xr.apply_ufunc`  618081836
628690749 https://github.com/pydata/xarray/pull/4060#issuecomment-628690749 https://api.github.com/repos/pydata/xarray/issues/4060 MDEyOklzc3VlQ29tbWVudDYyODY5MDc0OQ== kmuehlbauer 5821660 2020-05-14T14:57:23Z 2020-05-14T14:57:23Z MEMBER

This is ready for review from my side.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  ENH: use `dask.array.apply_gufunc` in `xr.apply_ufunc`  618081836
628533990 https://github.com/pydata/xarray/pull/4060#issuecomment-628533990 https://api.github.com/repos/pydata/xarray/issues/4060 MDEyOklzc3VlQ29tbWVudDYyODUzMzk5MA== kmuehlbauer 5821660 2020-05-14T10:07:17Z 2020-05-14T10:07:17Z MEMBER

This would need some docstring changing too. But I first want to check, if I've missed anything vital in the implementation.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  ENH: use `dask.array.apply_gufunc` in `xr.apply_ufunc`  618081836

Advanced export

JSON shape: default, array, newline-delimited, object

CSV options:

CREATE TABLE [issue_comments] (
   [html_url] TEXT,
   [issue_url] TEXT,
   [id] INTEGER PRIMARY KEY,
   [node_id] TEXT,
   [user] INTEGER REFERENCES [users]([id]),
   [created_at] TEXT,
   [updated_at] TEXT,
   [author_association] TEXT,
   [body] TEXT,
   [reactions] TEXT,
   [performed_via_github_app] TEXT,
   [issue] INTEGER REFERENCES [issues]([id])
);
CREATE INDEX [idx_issue_comments_issue]
    ON [issue_comments] ([issue]);
CREATE INDEX [idx_issue_comments_user]
    ON [issue_comments] ([user]);
Powered by Datasette · Queries took 15.915ms · About: xarray-datasette