home / github

Menu
  • GraphQL API
  • Search all tables

issue_comments

Table actions
  • GraphQL API for issue_comments

8 rows where author_association = "MEMBER" and issue = 713834297 sorted by updated_at descending

✎ View and edit SQL

This data as json, CSV (advanced)

Suggested facets: reactions, created_at (date), updated_at (date)

user 3

  • max-sixty 4
  • shoyer 2
  • mathause 2

issue 1

  • Allow skipna in .dot() · 8 ✖

author_association 1

  • MEMBER · 8 ✖
id html_url issue_url node_id user created_at updated_at ▲ author_association body reactions performed_via_github_app issue
708088129 https://github.com/pydata/xarray/issues/4482#issuecomment-708088129 https://api.github.com/repos/pydata/xarray/issues/4482 MDEyOklzc3VlQ29tbWVudDcwODA4ODEyOQ== max-sixty 5635139 2020-10-14T00:50:40Z 2020-10-14T00:50:40Z MEMBER

Right — that makes sense now. Given that .fillna(0) creates a copy, when we're doing stride tricks in the form of construct then that copy can be huge.

So I think there's a spectrum of implementations of skipna, o/w two are: - as a convenient alias of .fillna(0), like @mathause 's example above IIUC - ensuring a copy isn't made, which may require diving into np.einops (or open to alternatives)

The second would be required for @heerad 's case above

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Allow skipna in .dot() 713834297
707270259 https://github.com/pydata/xarray/issues/4482#issuecomment-707270259 https://api.github.com/repos/pydata/xarray/issues/4482 MDEyOklzc3VlQ29tbWVudDcwNzI3MDI1OQ== shoyer 1217238 2020-10-12T18:08:55Z 2020-10-12T18:08:55Z MEMBER

I'm happy to live with a memory copy for now with fillna and notnull, but allocating the full, un-chunked array into memory is a showstopper. Is there a different workaround that I can use in the meantime?

This is surprising behavior, and definitely sounds like a bug!

If you could put together a minimal test case for reproducing the issue, we could look into it. It's hard to say what a work-around would be without knowing the source of the issue.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Allow skipna in .dot() 713834297
706442785 https://github.com/pydata/xarray/issues/4482#issuecomment-706442785 https://api.github.com/repos/pydata/xarray/issues/4482 MDEyOklzc3VlQ29tbWVudDcwNjQ0Mjc4NQ== max-sixty 5635139 2020-10-09T23:29:23Z 2020-10-09T23:29:23Z MEMBER

Maybe on very small arrays it's quicker to do a product than a copy? As the array scales, it's surely not — dot product is O(n^3) or similar. I would be interested to see a repro...

{
    "total_count": 1,
    "+1": 1,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Allow skipna in .dot() 713834297
706140256 https://github.com/pydata/xarray/issues/4482#issuecomment-706140256 https://api.github.com/repos/pydata/xarray/issues/4482 MDEyOklzc3VlQ29tbWVudDcwNjE0MDI1Ng== mathause 10194086 2020-10-09T12:01:37Z 2020-10-09T12:01:37Z MEMBER

fillna is basically where(notnull(data), data, other). I think what takes the longest is the where part - possibly making a memory copy(?).

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Allow skipna in .dot() 713834297
704634447 https://github.com/pydata/xarray/issues/4482#issuecomment-704634447 https://api.github.com/repos/pydata/xarray/issues/4482 MDEyOklzc3VlQ29tbWVudDcwNDYzNDQ0Nw== max-sixty 5635139 2020-10-07T01:12:09Z 2020-10-07T01:12:09Z MEMBER

Any idea why x.fillna(0.) is so much more expensive than x.dot(y)?

For some functions where fillna(0.) has a different result to skipna=True, I could see it being worthwhile to writing new routines. But for dot product, it's definitionally the same. And fillna seems a much simpler operation than dot...

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Allow skipna in .dot() 713834297
704064370 https://github.com/pydata/xarray/issues/4482#issuecomment-704064370 https://api.github.com/repos/pydata/xarray/issues/4482 MDEyOklzc3VlQ29tbWVudDcwNDA2NDM3MA== shoyer 1217238 2020-10-06T06:38:51Z 2020-10-06T06:38:51Z MEMBER

I agree this would be welcome! Even if it isn't much faster than the options already shown here, at least we could point users to the best option we know of.

I suspect achieving the full speed of dot() with skip-NA support is impossible, but we can probably do much better. I might start by prototyping something in Numba, just to get a sense of what is achievable with a low-level approach. But keep in mind that functions like np.dot and np.einsum ("GEMM") are a few of the most highly optimized routines in numerical computing.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Allow skipna in .dot() 713834297
702965721 https://github.com/pydata/xarray/issues/4482#issuecomment-702965721 https://api.github.com/repos/pydata/xarray/issues/4482 MDEyOklzc3VlQ29tbWVudDcwMjk2NTcyMQ== mathause 10194086 2020-10-02T21:27:33Z 2020-10-02T21:27:33Z MEMBER

Yes that would be very helpful. This is used for the weighted operations:

https://github.com/pydata/xarray/blob/333e8dba55f0165ccadf18f2aaaee9257a4d716b/xarray/core/weighted.py#L129-L135

and it would be great if this could be done upstream. However, dot is implemented using np.einsum which is quite a gnarly beast.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Allow skipna in .dot() 713834297
702937094 https://github.com/pydata/xarray/issues/4482#issuecomment-702937094 https://api.github.com/repos/pydata/xarray/issues/4482 MDEyOklzc3VlQ29tbWVudDcwMjkzNzA5NA== max-sixty 5635139 2020-10-02T20:13:37Z 2020-10-02T20:13:37Z MEMBER

I agree this would be a welcome option.

As a workaround, you could fillna with zero?

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Allow skipna in .dot() 713834297

Advanced export

JSON shape: default, array, newline-delimited, object

CSV options:

CREATE TABLE [issue_comments] (
   [html_url] TEXT,
   [issue_url] TEXT,
   [id] INTEGER PRIMARY KEY,
   [node_id] TEXT,
   [user] INTEGER REFERENCES [users]([id]),
   [created_at] TEXT,
   [updated_at] TEXT,
   [author_association] TEXT,
   [body] TEXT,
   [reactions] TEXT,
   [performed_via_github_app] TEXT,
   [issue] INTEGER REFERENCES [issues]([id])
);
CREATE INDEX [idx_issue_comments_issue]
    ON [issue_comments] ([issue]);
CREATE INDEX [idx_issue_comments_user]
    ON [issue_comments] ([user]);
Powered by Datasette · Queries took 4795.84ms · About: xarray-datasette