home / github

Menu
  • GraphQL API
  • Search all tables

issue_comments

Table actions
  • GraphQL API for issue_comments

10 rows where author_association = "MEMBER" and issue = 653430454 sorted by updated_at descending

✎ View and edit SQL

This data as json, CSV (advanced)

Suggested facets: reactions, created_at (date), updated_at (date)

user 4

  • mrocklin 4
  • dcherian 3
  • shoyer 2
  • crusaderky 1

issue 1

  • Support for duck Dask Arrays · 10 ✖

author_association 1

  • MEMBER · 10 ✖
id html_url issue_url node_id user created_at updated_at ▲ author_association body reactions performed_via_github_app issue
663148752 https://github.com/pydata/xarray/issues/4208#issuecomment-663148752 https://api.github.com/repos/pydata/xarray/issues/4208 MDEyOklzc3VlQ29tbWVudDY2MzE0ODc1Mg== mrocklin 306380 2020-07-23T17:57:55Z 2020-07-23T17:57:55Z MEMBER

Dask collections tokenize quickly. We just use the name I think.

{
    "total_count": 1,
    "+1": 1,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Support for duck Dask Arrays 653430454
663135877 https://github.com/pydata/xarray/issues/4208#issuecomment-663135877 https://api.github.com/repos/pydata/xarray/issues/4208 MDEyOklzc3VlQ29tbWVudDY2MzEzNTg3Nw== dcherian 2448579 2020-07-23T17:31:18Z 2020-07-23T17:31:18Z MEMBER

Re:rechunk, this should be part of the spec I guess. We need this for DataArray.chunk().

xarray does do some automatic rechunking in variable.py. But this comment: # chunked data should come out with the same chunks; this makes # it feasible to combine shifted and unshifted data # TODO: remove this once dask.array automatically aligns chunks suggest that we could delete that automatic rechunking today.

This will probably be very fast because you're probably just returning the name of the underlying dask array as well as the unit of the pint array/quatity.

ah yes, we can rely on the underlying array library to optimize this.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Support for duck Dask Arrays 653430454
663123118 https://github.com/pydata/xarray/issues/4208#issuecomment-663123118 https://api.github.com/repos/pydata/xarray/issues/4208 MDEyOklzc3VlQ29tbWVudDY2MzEyMzExOA== mrocklin 306380 2020-07-23T17:05:30Z 2020-07-23T17:05:30Z MEMBER

That's exactly what's been done in Pint (see hgrecco/pint#1129)! @dcherian's points go beyond just that and address what Pint hasn't covered yet through the standard collection interface.

Ah, great. My bad.

how do we ask a duck dask array to rechunk itself? pint seems to forward the .rechunk call but that isn't formalized anywhere AFAICT.

I think that you would want to make a pint array rechunk method that called down to the dask array rechunk method. My guess is that this might come up in other situations as well.

less important: should duck dask arrays cache their token somewhere? dask.array uses .name to do this and xarray uses that to check equality cheaply. We can use tokenize of course. But I'm wondering if it's worth asking duck dask arrays to cache their token as an optimization.

I think that implementing the dask.base.normalize_token method should be fine. This will probably be very fast because you're probably just returning the name of the underlying dask array as well as the unit of the pint array/quatity. I don't think that caching would be necessary here.

It's also possible that we could look at the __dask_layers__ method to get this information. My memory is a bit fuzzy here though.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Support for duck Dask Arrays 653430454
663119539 https://github.com/pydata/xarray/issues/4208#issuecomment-663119539 https://api.github.com/repos/pydata/xarray/issues/4208 MDEyOklzc3VlQ29tbWVudDY2MzExOTUzOQ== mrocklin 306380 2020-07-23T16:58:27Z 2020-07-23T16:58:27Z MEMBER

My guess is that we could steal the xarray.DataArray implementations over to Pint without causing harm.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Support for duck Dask Arrays 653430454
663119334 https://github.com/pydata/xarray/issues/4208#issuecomment-663119334 https://api.github.com/repos/pydata/xarray/issues/4208 MDEyOklzc3VlQ29tbWVudDY2MzExOTMzNA== mrocklin 306380 2020-07-23T16:58:06Z 2020-07-23T16:58:06Z MEMBER

In Xarray we implemented the Dask collection spec. https://docs.dask.org/en/latest/custom-collections.html#the-dask-collection-interface

We might want to do that with Pint as well, if they're going to contain Dask things. That way Dask operations like dask.persist, dask.visualize, and dask.compute will work normally.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Support for duck Dask Arrays 653430454
663117842 https://github.com/pydata/xarray/issues/4208#issuecomment-663117842 https://api.github.com/repos/pydata/xarray/issues/4208 MDEyOklzc3VlQ29tbWVudDY2MzExNzg0Mg== dcherian 2448579 2020-07-23T16:55:11Z 2020-07-23T16:55:11Z MEMBER

A couple of things came up in #4221 1. how do we ask a duck dask array to rechunk itself? pint seems to forward the .rechunk call but that isn't formalized anywhere AFAICT. 2. less important: should duck dask arrays cache their token somewhere? dask.array uses .name to do this and xarray uses that to check equality cheaply. We can use tokenize of course. But I'm wondering if it's worth asking duck dask arrays to cache their token as an optimization.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Support for duck Dask Arrays 653430454
656358078 https://github.com/pydata/xarray/issues/4208#issuecomment-656358078 https://api.github.com/repos/pydata/xarray/issues/4208 MDEyOklzc3VlQ29tbWVudDY1NjM1ODA3OA== dcherian 2448579 2020-07-09T21:22:56Z 2020-07-09T21:22:56Z MEMBER

We have https://github.com/pydata/xarray/blob/master/xarray/core/pycompat.py which defines dask_array_type and sparse_array_type and then use isinstance(da, dask_array_type) in a bunch of places (e.g. duck_array_ops).

re duck array check: @keewis added this recently https://github.com/pydata/xarray/blob/f3ca63a4ac5c091a92085b477a0d34c08df88aa6/xarray/core/utils.py#L250-L253

{
    "total_count": 2,
    "+1": 2,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Support for duck Dask Arrays 653430454
656068407 https://github.com/pydata/xarray/issues/4208#issuecomment-656068407 https://api.github.com/repos/pydata/xarray/issues/4208 MDEyOklzc3VlQ29tbWVudDY1NjA2ODQwNw== crusaderky 6213168 2020-07-09T11:18:15Z 2020-07-09T11:19:28Z MEMBER

Is it acceptable for a Pint Quantity to always have the Dask collection interface defined (i.e., be a duck Dask array), even when its magnitude (what it wraps) is not a Dask Array?

I think there are already enough headaches with __iter__ being always defined and confusing libraries such as pandas (https://github.com/hgrecco/pint/issues/1128). I don't see why pint should be explicitly aware of dask (except in unit tests)? It should only deal with generic NEP18-compatible libraries (numpy, dask, sparse, cupy, etc.).

How should xarray check for a duck Dask Array?

We should ask the dask team to formalize what defines a "dask-array-like", like they already did with dask collections, and implement their definition in xarray. I'd personally make it "whatever defines a numpy-array-like AND has a chunks method AND the chunks method returns a tuple".

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Support for duck Dask Arrays 653430454
655820797 https://github.com/pydata/xarray/issues/4208#issuecomment-655820797 https://api.github.com/repos/pydata/xarray/issues/4208 MDEyOklzc3VlQ29tbWVudDY1NTgyMDc5Nw== shoyer 1217238 2020-07-09T00:09:58Z 2020-07-09T00:09:58Z MEMBER

It might also make sense to check for one or more of the special dask collection attributes (__dask_graph__, __dask_keys__, etc)

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Support for duck Dask Arrays 653430454
655810311 https://github.com/pydata/xarray/issues/4208#issuecomment-655810311 https://api.github.com/repos/pydata/xarray/issues/4208 MDEyOklzc3VlQ29tbWVudDY1NTgxMDMxMQ== shoyer 1217238 2020-07-08T23:31:21Z 2020-07-08T23:31:21Z MEMBER

Maybe something like this would work? def is_duck_dask_array(x): return getattr(x, 'chunks', None) is not None

xarray.DataArray would pass this test (chunks is either None for non-dask arrays or a tuple for dask arrays), so this would be consistent with what we already do.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Support for duck Dask Arrays 653430454

Advanced export

JSON shape: default, array, newline-delimited, object

CSV options:

CREATE TABLE [issue_comments] (
   [html_url] TEXT,
   [issue_url] TEXT,
   [id] INTEGER PRIMARY KEY,
   [node_id] TEXT,
   [user] INTEGER REFERENCES [users]([id]),
   [created_at] TEXT,
   [updated_at] TEXT,
   [author_association] TEXT,
   [body] TEXT,
   [reactions] TEXT,
   [performed_via_github_app] TEXT,
   [issue] INTEGER REFERENCES [issues]([id])
);
CREATE INDEX [idx_issue_comments_issue]
    ON [issue_comments] ([issue]);
CREATE INDEX [idx_issue_comments_user]
    ON [issue_comments] ([user]);
Powered by Datasette · Queries took 13.978ms · About: xarray-datasette