home / github

Menu
  • GraphQL API
  • Search all tables

issue_comments

Table actions
  • GraphQL API for issue_comments

5 rows where issue = 202423683 sorted by updated_at descending

✎ View and edit SQL

This data as json, CSV (advanced)

Suggested facets: created_at (date), updated_at (date)

user 2

  • crusaderky 3
  • shoyer 2

issue 1

  • fast weighted sum · 5 ✖

author_association 1

  • MEMBER 5
id html_url issue_url node_id user created_at updated_at ▲ author_association body reactions performed_via_github_app issue
519833424 https://github.com/pydata/xarray/issues/1224#issuecomment-519833424 https://api.github.com/repos/pydata/xarray/issues/1224 MDEyOklzc3VlQ29tbWVudDUxOTgzMzQyNA== crusaderky 6213168 2019-08-09T08:36:09Z 2019-08-09T08:36:09Z MEMBER

Retiring this as it is way too specialized for the main xarray library.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  fast weighted sum 202423683
388545372 https://github.com/pydata/xarray/issues/1224#issuecomment-388545372 https://api.github.com/repos/pydata/xarray/issues/1224 MDEyOklzc3VlQ29tbWVudDM4ODU0NTM3Mg== crusaderky 6213168 2018-05-12T10:22:02Z 2018-05-12T10:22:02Z MEMBER

Both. One of the biggest problem is that the data of my interestest is a mix of - 1D arrays with dims=(scenario, ) and shape=(500000, ) (stressed financial instruments under a Monte Carlo stress set) - 0D arrays with dims=() (financial instruments that are impervious to the Monte Carlo stresses and never change values) So before you do concat(), you need to call broadcast(), which effectively means that doing the sums on your bunch of very fast 0D instruments suddendly requires repeating them on 500k points.

Even keeping the two lots separate (which is fastwsum does) performed considerably slower.

However, this was over a year ago and much before xarray.dot() and dask.einsum(), so I'll need to tinker with it again.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  fast weighted sum 202423683
274380660 https://github.com/pydata/xarray/issues/1224#issuecomment-274380660 https://api.github.com/repos/pydata/xarray/issues/1224 MDEyOklzc3VlQ29tbWVudDI3NDM4MDY2MA== shoyer 1217238 2017-01-23T02:04:23Z 2017-01-23T02:04:23Z MEMBER

Was concat slow at graph construction or compute time? On Sun, Jan 22, 2017 at 6:02 PM crusaderky notifications@github.com wrote:

(arrays * weights).sum('stacked') was my first attempt. It performed considerably worse than sum(a * w for a, w in zip(arrays, weights)) - mostly because xarray.concat() is not terribly performant (I did not look deeper into it).

I did not try dask.array.sum() - worth some playing with.

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/pydata/xarray/issues/1224#issuecomment-274380448, or mute the thread https://github.com/notifications/unsubscribe-auth/ABKS1oZrmY8hgglb3RBTcDcFhcLhs8Lbks5rVAoggaJpZM4LqkHo .

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  fast weighted sum 202423683
274380448 https://github.com/pydata/xarray/issues/1224#issuecomment-274380448 https://api.github.com/repos/pydata/xarray/issues/1224 MDEyOklzc3VlQ29tbWVudDI3NDM4MDQ0OA== crusaderky 6213168 2017-01-23T02:02:08Z 2017-01-23T02:02:08Z MEMBER

(arrays * weights).sum('stacked') was my first attempt. It performed considerably worse than sum(a * w for a, w in zip(arrays, weights)) - mostly because xarray.concat() is not terribly performant (I did not look deeper into it).

I did not try dask.array.sum() - worth some playing with.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  fast weighted sum 202423683
274375810 https://github.com/pydata/xarray/issues/1224#issuecomment-274375810 https://api.github.com/repos/pydata/xarray/issues/1224 MDEyOklzc3VlQ29tbWVudDI3NDM3NTgxMA== shoyer 1217238 2017-01-23T01:09:49Z 2017-01-23T01:09:49Z MEMBER

Interesting -- thanks for sharing! I am interested in performance improvements but also a little reluctant to add in specialized optimizations directly into xarray.

You write that this is equivalent to sum(a * w for a, w in zip(arrays, weights)). How does this compare to stacking doing the sum in xarray, e.g.,(arrays * weights).sum('stacked'), where arrays and weights are now DataArray objects with a 'stacked' dimension? Or maybe arrays.dot(weights)?

Using vectorized operations feels a bit more idiomatic (though also maybe more verbose). It also may be more performant. Note that the builtin sum is not optimized well by dask because it's basically equivalent to a loop: def sum(xs): result = 0 for x in xs: result += x return result In contrast, dask.array.sum builds up a tree so it can do the sum in parallel.

There have also been discussion in https://github.com/pydata/xarray/issues/422 about adding a dedicated method for weighted mean.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  fast weighted sum 202423683

Advanced export

JSON shape: default, array, newline-delimited, object

CSV options:

CREATE TABLE [issue_comments] (
   [html_url] TEXT,
   [issue_url] TEXT,
   [id] INTEGER PRIMARY KEY,
   [node_id] TEXT,
   [user] INTEGER REFERENCES [users]([id]),
   [created_at] TEXT,
   [updated_at] TEXT,
   [author_association] TEXT,
   [body] TEXT,
   [reactions] TEXT,
   [performed_via_github_app] TEXT,
   [issue] INTEGER REFERENCES [issues]([id])
);
CREATE INDEX [idx_issue_comments_issue]
    ON [issue_comments] ([issue]);
CREATE INDEX [idx_issue_comments_user]
    ON [issue_comments] ([user]);
Powered by Datasette · Queries took 12.236ms · About: xarray-datasette