home / github

Menu
  • GraphQL API
  • Search all tables

issue_comments

Table actions
  • GraphQL API for issue_comments

37 rows where author_association = "MEMBER", issue = 618081836 and user = 5821660 sorted by updated_at descending

✎ View and edit SQL

This data as json, CSV (advanced)

Suggested facets: reactions, created_at (date), updated_at (date)

user 1

  • kmuehlbauer · 37 ✖

issue 1

  • ENH: use `dask.array.apply_gufunc` in `xr.apply_ufunc` · 37 ✖

author_association 1

  • MEMBER · 37 ✖
id html_url issue_url node_id user created_at updated_at ▲ author_association body reactions performed_via_github_app issue
675973700 https://github.com/pydata/xarray/pull/4060#issuecomment-675973700 https://api.github.com/repos/pydata/xarray/issues/4060 MDEyOklzc3VlQ29tbWVudDY3NTk3MzcwMA== kmuehlbauer 5821660 2020-08-19T08:52:40Z 2020-08-19T08:52:40Z MEMBER

Thanks to all reviewers! Great job!

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  ENH: use `dask.array.apply_gufunc` in `xr.apply_ufunc`  618081836
675269117 https://github.com/pydata/xarray/pull/4060#issuecomment-675269117 https://api.github.com/repos/pydata/xarray/issues/4060 MDEyOklzc3VlQ29tbWVudDY3NTI2OTExNw== kmuehlbauer 5821660 2020-08-18T05:55:06Z 2020-08-18T05:55:06Z MEMBER

@mathause I've merged latest master into this PR to hopefully get all tests green. The former had some problems with a conda error in MinimumVersions job.

Please let me know, if there is anything for me to do, to get this merged.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  ENH: use `dask.array.apply_gufunc` in `xr.apply_ufunc`  618081836
674823266 https://github.com/pydata/xarray/pull/4060#issuecomment-674823266 https://api.github.com/repos/pydata/xarray/issues/4060 MDEyOklzc3VlQ29tbWVudDY3NDgyMzI2Ng== kmuehlbauer 5821660 2020-08-17T11:21:38Z 2020-08-17T11:21:38Z MEMBER

While having a last review I've found another small glitch. I'll come back the next days to see, if anything needs to be done from reviewers side.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  ENH: use `dask.array.apply_gufunc` in `xr.apply_ufunc`  618081836
674806601 https://github.com/pydata/xarray/pull/4060#issuecomment-674806601 https://api.github.com/repos/pydata/xarray/issues/4060 MDEyOklzc3VlQ29tbWVudDY3NDgwNjYwMQ== kmuehlbauer 5821660 2020-08-17T10:43:26Z 2020-08-17T10:43:26Z MEMBER

Great, than it looks like it's finally done. :smiley:

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  ENH: use `dask.array.apply_gufunc` in `xr.apply_ufunc`  618081836
674803101 https://github.com/pydata/xarray/pull/4060#issuecomment-674803101 https://api.github.com/repos/pydata/xarray/issues/4060 MDEyOklzc3VlQ29tbWVudDY3NDgwMzEwMQ== kmuehlbauer 5821660 2020-08-17T10:35:16Z 2020-08-17T10:35:16Z MEMBER

@keewis @mathause Thanks for the review. I've added a checklist in the first post with "open issues" with this PR, which might be solved in a follow up PR. Would be good to know which need to go in here, so I can add this.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  ENH: use `dask.array.apply_gufunc` in `xr.apply_ufunc`  618081836
674753706 https://github.com/pydata/xarray/pull/4060#issuecomment-674753706 https://api.github.com/repos/pydata/xarray/issues/4060 MDEyOklzc3VlQ29tbWVudDY3NDc1MzcwNg== kmuehlbauer 5821660 2020-08-17T08:56:30Z 2020-08-17T08:56:30Z MEMBER

@mathause I've merged with master ( + resolved conflicts) and addressed the latest review comments by @dcherian.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  ENH: use `dask.array.apply_gufunc` in `xr.apply_ufunc`  618081836
674160208 https://github.com/pydata/xarray/pull/4060#issuecomment-674160208 https://api.github.com/repos/pydata/xarray/issues/4060 MDEyOklzc3VlQ29tbWVudDY3NDE2MDIwOA== kmuehlbauer 5821660 2020-08-14T16:38:12Z 2020-08-14T16:38:12Z MEMBER

@dcherian Thanks for your latest review comments. I'll try to come up with a solution after merging with latest master (and resolving conflicts).

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  ENH: use `dask.array.apply_gufunc` in `xr.apply_ufunc`  618081836
659197260 https://github.com/pydata/xarray/pull/4060#issuecomment-659197260 https://api.github.com/repos/pydata/xarray/issues/4060 MDEyOklzc3VlQ29tbWVudDY1OTE5NzI2MA== kmuehlbauer 5821660 2020-07-16T06:52:57Z 2020-07-16T06:52:57Z MEMBER

How should we proceed with this PR?

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  ENH: use `dask.array.apply_gufunc` in `xr.apply_ufunc`  618081836
651826366 https://github.com/pydata/xarray/pull/4060#issuecomment-651826366 https://api.github.com/repos/pydata/xarray/issues/4060 MDEyOklzc3VlQ29tbWVudDY1MTgyNjM2Ng== kmuehlbauer 5821660 2020-06-30T14:24:39Z 2020-06-30T14:24:39Z MEMBER

@mathause This should now raise for both type of 'output_sizes' errors, wrong dim name in 'output_sizes' and missing dim name in 'output_sizes'. I'll catch up with any review tomorrow morning CET.

{
    "total_count": 1,
    "+1": 1,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  ENH: use `dask.array.apply_gufunc` in `xr.apply_ufunc`  618081836
651800312 https://github.com/pydata/xarray/pull/4060#issuecomment-651800312 https://api.github.com/repos/pydata/xarray/issues/4060 MDEyOklzc3VlQ29tbWVudDY1MTgwMDMxMg== kmuehlbauer 5821660 2020-06-30T13:44:45Z 2020-06-30T13:44:45Z MEMBER

python if key not in signature.output_core_dims:

This does not work, because signature.output_core_dims is a list of tuples. So this would raise in any case. But I've no idea how to properly extract this into a single list/tuple...

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  ENH: use `dask.array.apply_gufunc` in `xr.apply_ufunc`  618081836
651779418 https://github.com/pydata/xarray/pull/4060#issuecomment-651779418 https://api.github.com/repos/pydata/xarray/issues/4060 MDEyOklzc3VlQ29tbWVudDY1MTc3OTQxOA== kmuehlbauer 5821660 2020-06-30T13:08:42Z 2020-06-30T13:08:42Z MEMBER

@mathause It seems I can't find a nice way of checking/catching errors.

For now, this fails with KeyError inside xarray when output_sizes contains a wrong dimension name and this fails with KeyError inside dask, but with the not helpful dim0 message. Maybe this can be fixed in a follow up PR?

I would also leave combining of output_core_dims and output_sizes for another PR.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  ENH: use `dask.array.apply_gufunc` in `xr.apply_ufunc`  618081836
651700823 https://github.com/pydata/xarray/pull/4060#issuecomment-651700823 https://api.github.com/repos/pydata/xarray/issues/4060 MDEyOklzc3VlQ29tbWVudDY1MTcwMDgyMw== kmuehlbauer 5821660 2020-06-30T10:14:53Z 2020-06-30T10:14:53Z MEMBER

@mathause I've tested your suggestion but I do not know, what better error message you expect from the dask side.

  • xr.apply_ufunc(..., output_core_dims=[["z"]], output_sizes={"z": 1}) signature.to_gufunc_string() will output ()->(dim0) (and output_sizes will be {'dim0': 1}
  • xr.apply_ufunc(..., output_core_dims=[["z"]], output_sizes={"y": 1}) signature.to_gufunc_string() will output ()->(dim0) (and output_sizes will be {'y': 1}

So the second will fail with KeyError: 'dim0'.

I think we can't get proper error messages from dask side, because dask doesn't know anything about the new dimension name ( at least not from signature.to_gufunc_string()). Should we catch the error and add a meaningful output?

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  ENH: use `dask.array.apply_gufunc` in `xr.apply_ufunc`  618081836
651653334 https://github.com/pydata/xarray/pull/4060#issuecomment-651653334 https://api.github.com/repos/pydata/xarray/issues/4060 MDEyOklzc3VlQ29tbWVudDY1MTY1MzMzNA== kmuehlbauer 5821660 2020-06-30T08:49:10Z 2020-06-30T08:49:10Z MEMBER

I made one more suggestion to catch the case

python xr.apply_ufunc(..., output_core_dims=[["abc"]], output_sizes={"xyz": 2})

Yes, I've got the email, but did not find it here. Thanks! Would it not be useful to combine output_core_dims and output_sizes, eg.

python xr.apply_ufunc(..., output_core_dims=[{"abc": 2]])

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  ENH: use `dask.array.apply_gufunc` in `xr.apply_ufunc`  618081836
651606271 https://github.com/pydata/xarray/pull/4060#issuecomment-651606271 https://api.github.com/repos/pydata/xarray/issues/4060 MDEyOklzc3VlQ29tbWVudDY1MTYwNjI3MQ== kmuehlbauer 5821660 2020-06-30T07:32:42Z 2020-06-30T07:32:42Z MEMBER

@mathause @shoyer @dcherian @max-sixty

This PR has gone a long way from first enabling multiple outputs to implementation of dask.apply_gufunc. Thanks for all your help, comments and suggestions. As xarray 0.16.0 is handled in the next days (or so I understood from reading the comments here #4031), I'm asking myself, if this can be finalized for 0.16.0?

I appreciate if we could work on any remaining problems to get this PR merged.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  ENH: use `dask.array.apply_gufunc` in `xr.apply_ufunc`  618081836
643162218 https://github.com/pydata/xarray/pull/4060#issuecomment-643162218 https://api.github.com/repos/pydata/xarray/issues/4060 MDEyOklzc3VlQ29tbWVudDY0MzE2MjIxOA== kmuehlbauer 5821660 2020-06-12T09:02:40Z 2020-06-12T09:02:40Z MEMBER

@dcherian @shoyer Only the upstream-dev is failing with the sparse-tests (#4146). This is ready for another review cycle. Please let me know, if there is more to work on this PR.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  ENH: use `dask.array.apply_gufunc` in `xr.apply_ufunc`  618081836
641098064 https://github.com/pydata/xarray/pull/4060#issuecomment-641098064 https://api.github.com/repos/pydata/xarray/issues/4060 MDEyOklzc3VlQ29tbWVudDY0MTA5ODA2NA== kmuehlbauer 5821660 2020-06-09T07:43:55Z 2020-06-09T07:43:55Z MEMBER

@mathause I reverted the mentioned error catching. I created a dedicated test for testing the dask vectorize without given dtype. Hope that fits your requirements.

{
    "total_count": 1,
    "+1": 1,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  ENH: use `dask.array.apply_gufunc` in `xr.apply_ufunc`  618081836
641056080 https://github.com/pydata/xarray/pull/4060#issuecomment-641056080 https://api.github.com/repos/pydata/xarray/issues/4060 MDEyOklzc3VlQ29tbWVudDY0MTA1NjA4MA== kmuehlbauer 5821660 2020-06-09T06:17:11Z 2020-06-09T06:17:11Z MEMBER

@mathause Sorry for the delay. I have currently no other idea how to circumvent the use of dask_gufunc_kwargs also for the non-dask vectorize. I'm happy to adapt the code to a more convenient and readable solution.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  ENH: use `dask.array.apply_gufunc` in `xr.apply_ufunc`  618081836
635337996 https://github.com/pydata/xarray/pull/4060#issuecomment-635337996 https://api.github.com/repos/pydata/xarray/issues/4060 MDEyOklzc3VlQ29tbWVudDYzNTMzNzk5Ng== kmuehlbauer 5821660 2020-05-28T13:06:16Z 2020-05-28T13:06:16Z MEMBER

This is ready for another review cycle. :smiley:

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  ENH: use `dask.array.apply_gufunc` in `xr.apply_ufunc`  618081836
635293190 https://github.com/pydata/xarray/pull/4060#issuecomment-635293190 https://api.github.com/repos/pydata/xarray/issues/4060 MDEyOklzc3VlQ29tbWVudDYzNTI5MzE5MA== kmuehlbauer 5821660 2020-05-28T11:55:03Z 2020-05-28T11:55:03Z MEMBER

@mathause Thanks, I had just approached that particular line while debugging, when your message came in :grin:. I'll add an xfail for that dask version first.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  ENH: use `dask.array.apply_gufunc` in `xr.apply_ufunc`  618081836
635237621 https://github.com/pydata/xarray/pull/4060#issuecomment-635237621 https://api.github.com/repos/pydata/xarray/issues/4060 MDEyOklzc3VlQ29tbWVudDYzNTIzNzYyMQ== kmuehlbauer 5821660 2020-05-28T09:46:32Z 2020-05-28T09:46:32Z MEMBER

@mathause Any idea why the meta test fails in Linux py36-min-all-deps?

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  ENH: use `dask.array.apply_gufunc` in `xr.apply_ufunc`  618081836
635226708 https://github.com/pydata/xarray/pull/4060#issuecomment-635226708 https://api.github.com/repos/pydata/xarray/issues/4060 MDEyOklzc3VlQ29tbWVudDYzNTIyNjcwOA== kmuehlbauer 5821660 2020-05-28T09:24:37Z 2020-05-28T09:24:37Z MEMBER

@mathause I've added your tests from #4022 to this PR (dtype/meta).

{
    "total_count": 1,
    "+1": 1,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  ENH: use `dask.array.apply_gufunc` in `xr.apply_ufunc`  618081836
635218745 https://github.com/pydata/xarray/pull/4060#issuecomment-635218745 https://api.github.com/repos/pydata/xarray/issues/4060 MDEyOklzc3VlQ29tbWVudDYzNTIxODc0NQ== kmuehlbauer 5821660 2020-05-28T09:08:39Z 2020-05-28T09:08:39Z MEMBER

OK, vectorization has to be inside apply_variable_ufunc to be applied properly on dask and non-dask configurations.

For the time being I use dask_gufunc_kwargs to distribute vectorize and output_dtypes into apply_variable_ufunc. Maybe we have to decide on more sensible name for that dict.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  ENH: use `dask.array.apply_gufunc` in `xr.apply_ufunc`  618081836
635196460 https://github.com/pydata/xarray/pull/4060#issuecomment-635196460 https://api.github.com/repos/pydata/xarray/issues/4060 MDEyOklzc3VlQ29tbWVudDYzNTE5NjQ2MA== kmuehlbauer 5821660 2020-05-28T08:25:38Z 2020-05-28T08:25:38Z MEMBER

Currently tests are failing for some test_missing. Any hints appreciated.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  ENH: use `dask.array.apply_gufunc` in `xr.apply_ufunc`  618081836
635194699 https://github.com/pydata/xarray/pull/4060#issuecomment-635194699 https://api.github.com/repos/pydata/xarray/issues/4060 MDEyOklzc3VlQ29tbWVudDYzNTE5NDY5OQ== kmuehlbauer 5821660 2020-05-28T08:22:13Z 2020-05-28T08:22:13Z MEMBER

Thanks @shoyer and @mathause. I've tried to adapt the PR to your comments.

  • The meta and output_sizes parameters should be and the vectorize and output_dtypes parameters can be given within the dask_gufunc_kwargs (which have priority). This would also be the place for allow_rechunk. Everything is mentioned in the docstring and warnings are issued to use the new dask_gufunc_kwargs. The dask_gufunc_kwargs itself is used to hold all keyword arguments for dask.array.apply_gufunc.

  • vectorization is done inside apply_ufunc only for dask not parallelized. In that case it is deferred to dask.array.apply_gufunc.

  • for chunking problems (core dimension with multiple chunks and non-core dimensions with different chunksizes) we let dask raise the errors.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  ENH: use `dask.array.apply_gufunc` in `xr.apply_ufunc`  618081836
634583266 https://github.com/pydata/xarray/pull/4060#issuecomment-634583266 https://api.github.com/repos/pydata/xarray/issues/4060 MDEyOklzc3VlQ29tbWVudDYzNDU4MzI2Ng== kmuehlbauer 5821660 2020-05-27T10:53:25Z 2020-05-27T10:53:25Z MEMBER

What I was trying to say - this is a change that can break existing code and it is not easy to do a proper deprecation cycle (i.e. only throw a warning). (I think allow_rechunk=True is not a good default.)

Yes, sure, I didn't think about the effects on existing code. We might need to capture both ValueErrors and handle differently. I'll try to come up with a proposal for this.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  ENH: use `dask.array.apply_gufunc` in `xr.apply_ufunc`  618081836
634497758 https://github.com/pydata/xarray/pull/4060#issuecomment-634497758 https://api.github.com/repos/pydata/xarray/issues/4060 MDEyOklzc3VlQ29tbWVudDYzNDQ5Nzc1OA== kmuehlbauer 5821660 2020-05-27T08:02:24Z 2020-05-27T08:02:24Z MEMBER

@mathause Thanks for the elaborate review. But, I'm unsure how to move on with this.

I'll reply to some of your comments below.

Note that output_dtypes is used in vectorize so this may not be a candidate for dask_gufunc_kwargs.

Hmm, the docstring indicates that it is only used with dask='parallelized'. But in the code it can be used in any case. This should be clarified.

The second is not so trivial. I see three possibilities (1) just error, (2) try dask.array.apply_gufunc if that fails issue a warning and use the old apply_blockwise (3) figure out ourselves if non-core dimensions (called loop dimensions in dask) are not-equally chunked, issue a warning and re-chunk them ourselves. Maybe @shoyer and @dcherian can weight in here.

When looking at the code of dask.array.apply_gufunc it seems, that any possible re-chunking is done in dask.array.blockwise. I also do not see any special treatment of warning or re-chunking of non-core dimensions in the current _apply_blockwise. So from my reasoning we should be safe to use (4) try dask.array.apply_gufunc since dask.array.apply_gufunc will issue the appropriate ValueError if something is wrong with the chunking. (Only thing is that the ValueError concerning the non-core dimension doesn't mention allow_rechunk). If the ValueError is issued, the user can the just set allow_rechunk=True keyword.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  ENH: use `dask.array.apply_gufunc` in `xr.apply_ufunc`  618081836
632557161 https://github.com/pydata/xarray/pull/4060#issuecomment-632557161 https://api.github.com/repos/pydata/xarray/issues/4060 MDEyOklzc3VlQ29tbWVudDYzMjU1NzE2MQ== kmuehlbauer 5821660 2020-05-22T08:03:28Z 2020-05-22T08:03:28Z MEMBER

although only two keywords seem to be worth feeding through (keepdims, allow_rechunk).

Just noticed, that output_dtypes and output_sizes could also be fed as gufunc_kwargs. So this would really simplify the calling signature of apply_ufunc if those kwargs could be handled that way.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  ENH: use `dask.array.apply_gufunc` in `xr.apply_ufunc`  618081836
632546838 https://github.com/pydata/xarray/pull/4060#issuecomment-632546838 https://api.github.com/repos/pydata/xarray/issues/4060 MDEyOklzc3VlQ29tbWVudDYzMjU0NjgzOA== kmuehlbauer 5821660 2020-05-22T07:41:49Z 2020-05-22T07:41:49Z MEMBER

I'll only be able to look at it properly next week.

@mathause I'll leave the PR unchanged and catch up with you next week.

@shoyer @dcherian Thanks for your comments. Please let me know, which tests should be added to check for any possible surprises with this change to apply_gufunc.

The problem is that we don't have any way to detect ahead of time whether the applied function already supports dask arrays (e.g., if it is built-up out of functions from dask.array). If it does, we don't want to set dask='parallelized' but rather let the function handle dask arrays itself.

(Att: no native english speaker here, so bear with me, if something sounds clunky or not exactly matching) Then we would have to keep the dask='forbidden' as default, as well as parallelized and allowed to force the decision to the user. Maybe the keyword settings itself could be a bit more clear. In the allowed-case the function in question has to natively support dask-arrays. So I would use dask='native' in that case. For the parallelized-case this PR proposes to use dask.array.apply_gufunc (generalized ufunc). So either we stick to parallelized or we try to find a better fitting name (eg. dask='gufunc').

For the keywords I think @dcherian 's proposal of something like dask_gufunc_kwargs (or gufunc_kwargs) is useful (would match with dask='gufunc'), although only two keywords seem to be worth feeding through (keepdims, allow_rechunk).

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  ENH: use `dask.array.apply_gufunc` in `xr.apply_ufunc`  618081836
631493099 https://github.com/pydata/xarray/pull/4060#issuecomment-631493099 https://api.github.com/repos/pydata/xarray/issues/4060 MDEyOklzc3VlQ29tbWVudDYzMTQ5MzA5OQ== kmuehlbauer 5821660 2020-05-20T14:02:07Z 2020-05-20T14:02:07Z MEMBER

@mathause All tests green, good starting point for review. Please notify other people who should have a look at this.

There are still things to discuss: - keywords of dask.array.apply_gufunc - howto properly handle deprecation (eg. meta) - doscstring - adding/revision of tests

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  ENH: use `dask.array.apply_gufunc` in `xr.apply_ufunc`  618081836
630840345 https://github.com/pydata/xarray/pull/4060#issuecomment-630840345 https://api.github.com/repos/pydata/xarray/issues/4060 MDEyOklzc3VlQ29tbWVudDYzMDg0MDM0NQ== kmuehlbauer 5821660 2020-05-19T14:03:42Z 2020-05-20T13:41:41Z MEMBER

I've given this a try, but this will need some design decisions.

  • currently vectorize is handled in any case if requested, before falling through the if/else. dask.array.apply_gufunc takes vectorize as parameter, so for dask we do not need to apply vectorization. We would need to apply vectorize only for non-dask cases (maybe just before calling the final function).
  • currently dask is only handled, if dask keyword is issued ('allowed' and parallelized). From my perspective the dask keyword is not needed any more. We could just divert to the apply_gufunc when dask backed arrays are detected.

This will really have much impact on the code/tests. I'll come up with a updated PullRequest in short time, but any thoughts /remarks whatsoever are very much appreciated.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  ENH: use `dask.array.apply_gufunc` in `xr.apply_ufunc`  618081836
631448161 https://github.com/pydata/xarray/pull/4060#issuecomment-631448161 https://api.github.com/repos/pydata/xarray/issues/4060 MDEyOklzc3VlQ29tbWVudDYzMTQ0ODE2MQ== kmuehlbauer 5821660 2020-05-20T12:40:24Z 2020-05-20T12:40:24Z MEMBER

@mathause Only one breaking test which is connected to the meta stuff you handle in #4022. Any suggestions on that topic? I've removed meta completely since it was only needed in dask.array.blockwise but not in dask.array.apply_gufunc.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  ENH: use `dask.array.apply_gufunc` in `xr.apply_ufunc`  618081836
631369862 https://github.com/pydata/xarray/pull/4060#issuecomment-631369862 https://api.github.com/repos/pydata/xarray/issues/4060 MDEyOklzc3VlQ29tbWVudDYzMTM2OTg2Mg== kmuehlbauer 5821660 2020-05-20T09:50:16Z 2020-05-20T09:50:16Z MEMBER

@mathause Great! Seems like this works better, thanks! Will update the PR after some more tests etc.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  ENH: use `dask.array.apply_gufunc` in `xr.apply_ufunc`  618081836
631301324 https://github.com/pydata/xarray/pull/4060#issuecomment-631301324 https://api.github.com/repos/pydata/xarray/issues/4060 MDEyOklzc3VlQ29tbWVudDYzMTMwMTMyNA== kmuehlbauer 5821660 2020-05-20T07:48:01Z 2020-05-20T07:48:01Z MEMBER

From looking at the tests, it seems that setting allow_rechunk=True would solve many issues. I've no idea about the implications on memory usage (docstring: "Warning: enabling this can increase memory usage significantly"). Might apply_gufunc not be suitable for processing of eg. dot?

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  ENH: use `dask.array.apply_gufunc` in `xr.apply_ufunc`  618081836
631297367 https://github.com/pydata/xarray/pull/4060#issuecomment-631297367 https://api.github.com/repos/pydata/xarray/issues/4060 MDEyOklzc3VlQ29tbWVudDYzMTI5NzM2Nw== kmuehlbauer 5821660 2020-05-20T07:40:24Z 2020-05-20T07:40:24Z MEMBER

@mathause @shoyer

First serve of trying to use dask.array.apply_gufunc in xr.apply_ufunc. I've added a list with problems in the topmost comment of this PR, to not loose track of this. Please enhance that list, if needed.

Most problematic issue now: xr.dot doesn't work well with apply_gufunc with regard to core dimensions and chunking.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  ENH: use `dask.array.apply_gufunc` in `xr.apply_ufunc`  618081836
629229610 https://github.com/pydata/xarray/pull/4060#issuecomment-629229610 https://api.github.com/repos/pydata/xarray/issues/4060 MDEyOklzc3VlQ29tbWVudDYyOTIyOTYxMA== kmuehlbauer 5821660 2020-05-15T13:15:19Z 2020-05-15T13:15:19Z MEMBER

Thanks @mathause for your comments and raising those questions. JFTR, I was taking the road from #1815, so my explicit use-case was the multiple (dask) outputs.

  • It might be good to add a test with a reduction and one with vectorize=True.

I'll try to add some tests for the multiple output using dask.

  • Would it be possible to replace the call to dask.array.blockwise (for one output variable) with dask.array.apply_gufunc? Do you know why blockwise is used further below and not dask.array.apply_gufunc? I assume it's due to historical reasons but I am not sure.

AFAIK, apply_gufunc wasn't available at the time these functions were introduced. Good chance, that apply_gufunc can be used for handling single output dask too.

  • dask.array.apply_gufunc does all sorts of stuff - e.g. infer meta. This could potentially solve #4015 (pull #4022) and simplify the call signature of apply_ufunc?

That's a good question. If you want me to go the long way, please be aware, that I'm a novice in xarray as well as in dask. A complete refactor of apply_ufunc would be quite some challenge.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  ENH: use `dask.array.apply_gufunc` in `xr.apply_ufunc`  618081836
628690749 https://github.com/pydata/xarray/pull/4060#issuecomment-628690749 https://api.github.com/repos/pydata/xarray/issues/4060 MDEyOklzc3VlQ29tbWVudDYyODY5MDc0OQ== kmuehlbauer 5821660 2020-05-14T14:57:23Z 2020-05-14T14:57:23Z MEMBER

This is ready for review from my side.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  ENH: use `dask.array.apply_gufunc` in `xr.apply_ufunc`  618081836
628533990 https://github.com/pydata/xarray/pull/4060#issuecomment-628533990 https://api.github.com/repos/pydata/xarray/issues/4060 MDEyOklzc3VlQ29tbWVudDYyODUzMzk5MA== kmuehlbauer 5821660 2020-05-14T10:07:17Z 2020-05-14T10:07:17Z MEMBER

This would need some docstring changing too. But I first want to check, if I've missed anything vital in the implementation.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  ENH: use `dask.array.apply_gufunc` in `xr.apply_ufunc`  618081836

Advanced export

JSON shape: default, array, newline-delimited, object

CSV options:

CREATE TABLE [issue_comments] (
   [html_url] TEXT,
   [issue_url] TEXT,
   [id] INTEGER PRIMARY KEY,
   [node_id] TEXT,
   [user] INTEGER REFERENCES [users]([id]),
   [created_at] TEXT,
   [updated_at] TEXT,
   [author_association] TEXT,
   [body] TEXT,
   [reactions] TEXT,
   [performed_via_github_app] TEXT,
   [issue] INTEGER REFERENCES [issues]([id])
);
CREATE INDEX [idx_issue_comments_issue]
    ON [issue_comments] ([issue]);
CREATE INDEX [idx_issue_comments_user]
    ON [issue_comments] ([user]);
Powered by Datasette · Queries took 29.484ms · About: xarray-datasette