home / github

Menu
  • Search all tables
  • GraphQL API

issue_comments

Table actions
  • GraphQL API for issue_comments

8 rows where author_association = "CONTRIBUTOR" and issue = 653430454 sorted by updated_at descending

✎ View and edit SQL

This data as json, CSV (advanced)

Suggested facets: reactions, created_at (date), updated_at (date)

user 3

  • jthielen 6
  • dopplershift 1
  • rpmanser 1

issue 1

  • Support for duck Dask Arrays · 8 ✖

author_association 1

  • CONTRIBUTOR · 8 ✖
id html_url issue_url node_id user created_at updated_at ▲ author_association body reactions performed_via_github_app issue
663121190 https://github.com/pydata/xarray/issues/4208#issuecomment-663121190 https://api.github.com/repos/pydata/xarray/issues/4208 MDEyOklzc3VlQ29tbWVudDY2MzEyMTE5MA== jthielen 3460034 2020-07-23T17:01:44Z 2020-07-23T17:03:20Z CONTRIBUTOR

In Xarray we implemented the Dask collection spec. https://docs.dask.org/en/latest/custom-collections.html#the-dask-collection-interface

We might want to do that with Pint as well, if they're going to contain Dask things. That way Dask operations like dask.persist, dask.visualize, and dask.compute will work normally.

That's exactly what's been done in Pint (see https://github.com/hgrecco/pint/pull/1129)! @dcherian's points go beyond just that and address what Pint hasn't covered yet through the standard collection interface.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Support for duck Dask Arrays 653430454
657152588 https://github.com/pydata/xarray/issues/4208#issuecomment-657152588 https://api.github.com/repos/pydata/xarray/issues/4208 MDEyOklzc3VlQ29tbWVudDY1NzE1MjU4OA== jthielen 3460034 2020-07-12T00:19:53Z 2020-07-12T00:19:53Z CONTRIBUTOR

Does/should any of this also consider #4212 (CuPy)?

Only indirectly, since this deals with duck Dask arrays (things like Pint that go between xarray and Dask) rather than Dask chunks, which CuPy would be. But, once this, #4212, https://github.com/hgrecco/pint/issues/964, and https://github.com/dask/dask/pull/6393 are all in place, then we can test if xarray( pint( dask( cupy ))) works automatically from it all or not.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Support for duck Dask Arrays 653430454
657135521 https://github.com/pydata/xarray/issues/4208#issuecomment-657135521 https://api.github.com/repos/pydata/xarray/issues/4208 MDEyOklzc3VlQ29tbWVudDY1NzEzNTUyMQ== dopplershift 221526 2020-07-11T21:49:36Z 2020-07-11T21:49:54Z CONTRIBUTOR

Does/should any of this also consider #4212 (CuPy)?

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Support for duck Dask Arrays 653430454
656250148 https://github.com/pydata/xarray/issues/4208#issuecomment-656250148 https://api.github.com/repos/pydata/xarray/issues/4208 MDEyOklzc3VlQ29tbWVudDY1NjI1MDE0OA== jthielen 3460034 2020-07-09T17:17:34Z 2020-07-10T17:47:16Z CONTRIBUTOR

Based on @mrocklin's comment in https://github.com/dask/dask/issues/6385, the plan will be to check for duck Dask Arrays with dask.base.is_dask_collection along with xarray's previously used duck array check. This works properly with Pint Quantities as implemented in https://github.com/hgrecco/pint/pull/1129 (returning True if the Pint Quantity contains a Dask Array, and False if it does not).

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Support for duck Dask Arrays 653430454
656358719 https://github.com/pydata/xarray/issues/4208#issuecomment-656358719 https://api.github.com/repos/pydata/xarray/issues/4208 MDEyOklzc3VlQ29tbWVudDY1NjM1ODcxOQ== jthielen 3460034 2020-07-09T21:24:42Z 2020-07-09T21:26:37Z CONTRIBUTOR

@rpmanser That sounds like a good plan to me at least, but it would be great if any of the xarray maintainers would be able to chime in. Also, thanks again for being willing to work on this while I try working on https://github.com/dask/dask/issues/4583. The hidden 4th step is of course testing--primarily that this doesn't break existing functionality, but also that it works for duck Dask Arrays other than Dask Arrays themselves (not sure if Pint Quantities in upcoming v0.15 or a mocked class would be better).

Also, thank you @dcherian for pointing out those checks, you found them faster than I did!

{
    "total_count": 1,
    "+1": 1,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Support for duck Dask Arrays 653430454
656349279 https://github.com/pydata/xarray/issues/4208#issuecomment-656349279 https://api.github.com/repos/pydata/xarray/issues/4208 MDEyOklzc3VlQ29tbWVudDY1NjM0OTI3OQ== rpmanser 19578931 2020-07-09T21:01:07Z 2020-07-09T21:01:07Z CONTRIBUTOR

I can go ahead with putting together a PR for this. Before I do so, I'd like to clarify what is expected.

  • Implement the is_duck_dask_array() function
  • In that implementation, use dask.base.is_dask_collection() and the existing duck array check(s)
  • Replace isinstance(x, dask.array.Array) checks with the new is_dask_duck_array() function

I searched for existing duck array checks in xarray and nothing immediately obvious to me is showing up. It looks like a check for __array_function__ is inappropriate based on discussion in #3917. Could someone point out the proper duck array check?

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Support for duck Dask Arrays 653430454
656119415 https://github.com/pydata/xarray/issues/4208#issuecomment-656119415 https://api.github.com/repos/pydata/xarray/issues/4208 MDEyOklzc3VlQ29tbWVudDY1NjExOTQxNQ== jthielen 3460034 2020-07-09T13:13:43Z 2020-07-09T13:13:43Z CONTRIBUTOR

I think there are already enough headaches with __iter__ being always defined and confusing libraries such as pandas (hgrecco/pint#1128). I don't see why pint should be explicitly aware of dask (except in unit tests)? It should only deal with generic NEP18-compatible libraries (numpy, dask, sparse, cupy, etc.).

Since Pint wraps Dask, in order to leverage Dask Array functionality on Pint Quantities, we need to have the Dask collection interface available. In a sense, Pint needs special handling for Dask like xarray Variables do since they both can be upcast types of Dask Array. Implicitly passing through attributes (how Pint handles special methods/attributes of downcast types in general) from the wrapped Dask Array is not sufficient, however, because the finalizers have to rewrap with Quantity (see https://github.com/hgrecco/pint/pull/1129/files#diff-d9924213798d0fc092b8cff13928d747R1947-R1950), hence the explicit awareness of Dask being needed in Pint.

We should ask the dask team to formalize what defines a "dask-array-like", like they already did with dask collections, and implement their definition in xarray.

Done! See https://github.com/dask/dask/issues/6385.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Support for duck Dask Arrays 653430454
655817246 https://github.com/pydata/xarray/issues/4208#issuecomment-655817246 https://api.github.com/repos/pydata/xarray/issues/4208 MDEyOklzc3VlQ29tbWVudDY1NTgxNzI0Ng== jthielen 3460034 2020-07-08T23:57:36Z 2020-07-08T23:59:28Z CONTRIBUTOR

Maybe something like this would work?

def is_duck_dask_array(x): return getattr(x, 'chunks', None) is not None

xarray.DataArray would pass this test (chunks is either None for non-dask arrays or a tuple for dask arrays), so this would be consistent with what we already do.

That would be a straightforward solution to both problems! A Pint Quantity containing a Dask Array passes along the chunks attribute from the Dask Array, and a Pint Quantity containing something else will raise an AttributeError. Unless there are other objections, I'll see what it will take to swap out the existing Dask checks for this in the xarray internals and hopefully get around to a PR (after I get some MetPy stuff done first).

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Support for duck Dask Arrays 653430454

Advanced export

JSON shape: default, array, newline-delimited, object

CSV options:

CREATE TABLE [issue_comments] (
   [html_url] TEXT,
   [issue_url] TEXT,
   [id] INTEGER PRIMARY KEY,
   [node_id] TEXT,
   [user] INTEGER REFERENCES [users]([id]),
   [created_at] TEXT,
   [updated_at] TEXT,
   [author_association] TEXT,
   [body] TEXT,
   [reactions] TEXT,
   [performed_via_github_app] TEXT,
   [issue] INTEGER REFERENCES [issues]([id])
);
CREATE INDEX [idx_issue_comments_issue]
    ON [issue_comments] ([issue]);
CREATE INDEX [idx_issue_comments_user]
    ON [issue_comments] ([user]);
Powered by Datasette · Queries took 17.062ms · About: xarray-datasette