home / github

Menu
  • GraphQL API
  • Search all tables

issue_comments

Table actions
  • GraphQL API for issue_comments

5 rows where issue = 618081836 and user = 1217238 sorted by updated_at descending

✎ View and edit SQL

This data as json, CSV (advanced)

Suggested facets: reactions, created_at (date), updated_at (date)

user 1

  • shoyer · 5 ✖

issue 1

  • ENH: use `dask.array.apply_gufunc` in `xr.apply_ufunc` · 5 ✖

author_association 1

  • MEMBER 5
id html_url issue_url node_id user created_at updated_at ▲ author_association body reactions performed_via_github_app issue
634776667 https://github.com/pydata/xarray/pull/4060#issuecomment-634776667 https://api.github.com/repos/pydata/xarray/issues/4060 MDEyOklzc3VlQ29tbWVudDYzNDc3NjY2Nw== shoyer 1217238 2020-05-27T16:18:24Z 2020-05-27T16:18:24Z MEMBER
  • un-equal chunking along non-core dimensions

The second is not so trivial. I see three possibilities (1) just error, (2) try dask.array.apply_gufunc if that fails issue a warning and use the old apply_blockwise (3) figure out ourselves if non-core dimensions (called loop dimensions in dask) are not-equally chunked, issue a warning and re-chunk them ourselves. Maybe @shoyer and @dcherian can weight in here.

I am pretty confident that existing behavior of xarray.apply_ufunc with un-equal chunks along non-core dimensions is entirely broken. I am OK with just erroring for now.

{
    "total_count": 3,
    "+1": 2,
    "-1": 0,
    "laugh": 1,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  ENH: use `dask.array.apply_gufunc` in `xr.apply_ufunc`  618081836
634774468 https://github.com/pydata/xarray/pull/4060#issuecomment-634774468 https://api.github.com/repos/pydata/xarray/issues/4060 MDEyOklzc3VlQ29tbWVudDYzNDc3NDQ2OA== shoyer 1217238 2020-05-27T16:14:33Z 2020-05-27T16:14:33Z MEMBER

Looking at the internals of dask.array.apply_gufunc, it looks like it passes on **kwargs to blockwise. blockwise gets called twice, first with only **kwargs and then with both meta and **kwargs: https://github.com/dask/dask/blob/77628f2d5248cc61cc366cac3e400b6df5c654c1/dask/array/gufunc.py#L422 https://github.com/dask/dask/blob/77628f2d5248cc61cc366cac3e400b6df5c654c1/dask/array/gufunc.py#L439

So it seems we can actually still pass on a meta argument explicitly, at least in the first case.

Here's my suggestion for moving forwarding:

  • If meta was explicitly set (meta is not None), pass it into apply_gufunc in **kwargs
  • Otherwise, omit meta.
{
    "total_count": 1,
    "+1": 1,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  ENH: use `dask.array.apply_gufunc` in `xr.apply_ufunc`  618081836
631702113 https://github.com/pydata/xarray/pull/4060#issuecomment-631702113 https://api.github.com/repos/pydata/xarray/issues/4060 MDEyOklzc3VlQ29tbWVudDYzMTcwMjExMw== shoyer 1217238 2020-05-20T20:16:09Z 2020-05-20T20:16:09Z MEMBER

Maybe this is too defensive/surprising, and could be relaxed.

You would remove the daks="forbidden" branch and not the dask="parallelized"?

For the functions that don't handle dask arrays gracefully, dask="parallelized" would be the better option?

This is probably another good motivation: defaulting to dask='forbidden' forces users to make an explicit choice about whether or not use dask='parallelized'.

The problem is that we don't have any way to detect ahead of time whether the applied function already supports dask arrays (e.g., if it is built-up out of functions from dask.array). If it does, we don't want to set dask='parallelized' but rather let the function handle dask arrays itself.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  ENH: use `dask.array.apply_gufunc` in `xr.apply_ufunc`  618081836
631689889 https://github.com/pydata/xarray/pull/4060#issuecomment-631689889 https://api.github.com/repos/pydata/xarray/issues/4060 MDEyOklzc3VlQ29tbWVudDYzMTY4OTg4OQ== shoyer 1217238 2020-05-20T19:50:23Z 2020-05-20T19:50:23Z MEMBER

The original motivation for requiring dask='allowed' is that I was concerned that users would put a function that coerces its arguments into NumPy arrays into apply_ufunc (e.g., like many functions from SciPy), which could have surprisingly bad performance when called on dask arrays due to automatic coercion.

Maybe this is too defensive/surprising, and could be relaxed. We don't really have any guard-rails like this elsewhere in xarray.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  ENH: use `dask.array.apply_gufunc` in `xr.apply_ufunc`  618081836
629369850 https://github.com/pydata/xarray/pull/4060#issuecomment-629369850 https://api.github.com/repos/pydata/xarray/issues/4060 MDEyOklzc3VlQ29tbWVudDYyOTM2OTg1MA== shoyer 1217238 2020-05-15T16:56:04Z 2020-05-15T16:56:04Z MEMBER
  • Would it be possible to replace the call to dask.array.blockwise (for one output variable) with dask.array.apply_gufunc? Do you know why blockwise is used further below and not dask.array.apply_gufunc? I assume it's due to historical reasons but I am not sure.

AFAIK, apply_gufunc wasn't available at the time these functions were introduced. Good chance, that apply_gufunc can be used for handling single output dask too.

Exactly. It would be nice remove the use of blockwise entirely in favor of apply_gufunc.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  ENH: use `dask.array.apply_gufunc` in `xr.apply_ufunc`  618081836

Advanced export

JSON shape: default, array, newline-delimited, object

CSV options:

CREATE TABLE [issue_comments] (
   [html_url] TEXT,
   [issue_url] TEXT,
   [id] INTEGER PRIMARY KEY,
   [node_id] TEXT,
   [user] INTEGER REFERENCES [users]([id]),
   [created_at] TEXT,
   [updated_at] TEXT,
   [author_association] TEXT,
   [body] TEXT,
   [reactions] TEXT,
   [performed_via_github_app] TEXT,
   [issue] INTEGER REFERENCES [issues]([id])
);
CREATE INDEX [idx_issue_comments_issue]
    ON [issue_comments] ([issue]);
CREATE INDEX [idx_issue_comments_user]
    ON [issue_comments] ([user]);
Powered by Datasette · Queries took 223.424ms · About: xarray-datasette