home / github

Menu
  • Search all tables
  • GraphQL API

issue_comments

Table actions
  • GraphQL API for issue_comments

4 rows where author_association = "MEMBER" and issue = 675482176 sorted by updated_at descending

✎ View and edit SQL

This data as json, CSV (advanced)

Suggested facets: reactions, created_at (date), updated_at (date)

user 3

  • mathause 2
  • dcherian 1
  • fujiisoup 1

issue 1

  • Optimize ndrolling nanreduce · 4 ✖

author_association 1

  • MEMBER · 4 ✖
id html_url issue_url node_id user created_at updated_at ▲ author_association body reactions performed_via_github_app issue
1507213204 https://github.com/pydata/xarray/issues/4325#issuecomment-1507213204 https://api.github.com/repos/pydata/xarray/issues/4325 IC_kwDOAMm_X85Z1j-U dcherian 2448579 2023-04-13T15:56:51Z 2023-04-13T15:56:51Z MEMBER

Over in https://github.com/pydata/xarray/issues/7344#issuecomment-1336299057 @shoyer

That said -- we could also switch to smarter NumPy based algorithms to implement most moving window calculations, e.g,. using np.nancumsum for moving window means.

After some digging, this would involve using "summed area tables" which have been generalized to nD, and can be used to compute all our built-in reductions (except median). Basically we'd store the summed area table (repeated np.cumsum) and then calculate reductions using binary ops (mostly subtraction) on those tables.

This would be an intermediate level project but we could implement it incrementally (start with sum for example). One downside is the potential for floating point inaccuracies because we're taking differences of potentially large numbers.

cc @aulemahal

{
    "total_count": 1,
    "+1": 1,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Optimize ndrolling nanreduce 675482176
716399575 https://github.com/pydata/xarray/issues/4325#issuecomment-716399575 https://api.github.com/repos/pydata/xarray/issues/4325 MDEyOklzc3VlQ29tbWVudDcxNjM5OTU3NQ== mathause 10194086 2020-10-26T08:40:51Z 2021-02-18T15:39:40Z MEMBER

This is already done for counts, correct? Here:

https://github.com/pydata/xarray/blob/1597e3a91eaf96626725987d23bbda2a80d2bae7/xarray/core/rolling.py#L370-L382

This should work for most of the reductions (and is a bit similar to what is done in weighted for mean and sum):

  • [x] count: isnull() -> rolling -> sum
  • [x] argmax: fillna(-inf) -> rolling -> argmax
  • [x] argmin: fillna(inf) -> rolling -> argmin
  • [x] max: fillna(-inf) -> rolling -> max (not sure about this one, need to be careful with the dtype)
  • [x] min: fillna(inf) -> rolling -> min (dito)
  • [x] mean: fillna(0) -> rolling -> sum / count (ensure nan if count == 0)
  • [x] prod: fillna(1) -> rolling -> prod
  • [x] sum: fillna(0) -> rolling -> sum
  • [ ] var: fillna(0) -> rolling -> possible (?) but a bit more involved
  • [ ] std: sqrt(var)
  • [ ] median: probably not possible

I think this should not be too difficult, the thing is that rolling itself is already quite complicated

{
    "total_count": 2,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 2,
    "rocket": 0,
    "eyes": 0
}
  Optimize ndrolling nanreduce 675482176
741734592 https://github.com/pydata/xarray/issues/4325#issuecomment-741734592 https://api.github.com/repos/pydata/xarray/issues/4325 MDEyOklzc3VlQ29tbWVudDc0MTczNDU5Mg== mathause 10194086 2020-12-09T12:17:15Z 2020-12-09T12:17:15Z MEMBER

I just saw that numpy 1.20 introduces stride_tricks.sliding_window_view. I have not looked at this yet. Just leaving this here for reference.

https://numpy.org/devdocs/reference/generated/numpy.lib.stride_tricks.sliding_window_view.html#numpy.lib.stride_tricks.sliding_window_view

https://numpy.org/devdocs/release/1.20.0-notes.html#sliding-window-view-provides-a-sliding-window-view-for-numpy-arrays

https://github.com/numpy/numpy/pull/17394

{
    "total_count": 1,
    "+1": 1,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Optimize ndrolling nanreduce 675482176
717572036 https://github.com/pydata/xarray/issues/4325#issuecomment-717572036 https://api.github.com/repos/pydata/xarray/issues/4325 MDEyOklzc3VlQ29tbWVudDcxNzU3MjAzNg== fujiisoup 6815844 2020-10-27T22:14:41Z 2020-10-27T22:14:41Z MEMBER

@mathause Oh, I missed this issue. Yes, this is implemented only for count.

the thing is that rolling itself is already quite complicated

Agreed. We need to clean this up.

One possible option would be to drop support of bottleneck. This does not work for nd-rolling and if we implement the nd-nanreduce, the speed should be comparable with bottleneck.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Optimize ndrolling nanreduce 675482176

Advanced export

JSON shape: default, array, newline-delimited, object

CSV options:

CREATE TABLE [issue_comments] (
   [html_url] TEXT,
   [issue_url] TEXT,
   [id] INTEGER PRIMARY KEY,
   [node_id] TEXT,
   [user] INTEGER REFERENCES [users]([id]),
   [created_at] TEXT,
   [updated_at] TEXT,
   [author_association] TEXT,
   [body] TEXT,
   [reactions] TEXT,
   [performed_via_github_app] TEXT,
   [issue] INTEGER REFERENCES [issues]([id])
);
CREATE INDEX [idx_issue_comments_issue]
    ON [issue_comments] ([issue]);
CREATE INDEX [idx_issue_comments_user]
    ON [issue_comments] ([user]);
Powered by Datasette · Queries took 14.251ms · About: xarray-datasette