home / github

Menu
  • GraphQL API
  • Search all tables

issue_comments

Table actions
  • GraphQL API for issue_comments

4 rows where author_association = "CONTRIBUTOR" and issue = 1497031605 sorted by updated_at descending

✎ View and edit SQL

This data as json, CSV (advanced)

Suggested facets: reactions, created_at (date), updated_at (date)

user 2

  • maawoo 3
  • arongergely 1

issue 1

  • Aggregating a dimension using the Quantiles method with `skipna=True` is very slow · 4 ✖

author_association 1

  • CONTRIBUTOR · 4 ✖
id html_url issue_url node_id user created_at updated_at ▲ author_association body reactions performed_via_github_app issue
1406463669 https://github.com/pydata/xarray/issues/7377#issuecomment-1406463669 https://api.github.com/repos/pydata/xarray/issues/7377 IC_kwDOAMm_X85T1O61 maawoo 56583917 2023-01-27T12:45:10Z 2024-01-03T08:41:41Z CONTRIBUTOR

Hi all, I just created a simple workaround, which might be useful for others:
https://gist.github.com/maawoo/0b34d371c3cc1960a1589ccaded868c2

It uses the _nan_quantile method of xclim and works fine for my applications. Here is a quick comparison using the same example data as in my initial post:

EDIT: I've updated the code to use numbagg instead of xclim.

{
    "total_count": 1,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 1,
    "rocket": 0,
    "eyes": 0
}
  Aggregating a dimension using the Quantiles method with `skipna=True` is very slow 1497031605
1362998362 https://github.com/pydata/xarray/issues/7377#issuecomment-1362998362 https://api.github.com/repos/pydata/xarray/issues/7377 IC_kwDOAMm_X85RPbRa maawoo 56583917 2022-12-22T15:52:31Z 2022-12-22T15:52:31Z CONTRIBUTOR

Thanks @arongergely! I have mentioned the numpy issue in my post above (FYI, for anyone looking for it). I was really surprised to see that it's over 2 years old and that this is now the first Xarray issue referencing it. If it's really a "well known" issue, I think it should have been somehow mentioned in the Xarray quantiles method.

I have seen the blog post and tried to use the workaround with apply_ufunc and Dask but ran into some problems. I'll revisit that when I have some time and will also check xclim. Seems to be very promising, thanks!

Happy holidays! 🎄

{
    "total_count": 1,
    "+1": 1,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Aggregating a dimension using the Quantiles method with `skipna=True` is very slow 1497031605
1359023892 https://github.com/pydata/xarray/issues/7377#issuecomment-1359023892 https://api.github.com/repos/pydata/xarray/issues/7377 IC_kwDOAMm_X85RAQ8U arongergely 7316393 2022-12-20T08:53:34Z 2022-12-20T08:57:52Z CONTRIBUTOR

Hi, this is a known issue coming from numpy.nanquantile / numpy.nanpercentile. I had the same problem - AFAIK the workaround is to implement your own nanpercentiles calculation.

If you want to take that route:

There is a blog post about the issue + a numpy workaround for 3D arrays: https://krstn.eu/np.nanpercentile()-there-has-to-be-a-faster-way/

I also turned to the numpy mailing list. Abel Aoun had a suggestion to look into the algo used at the xclim project. See our thread here: https://mail.python.org/archives/list/numpy-discussion@python.org/message/EKQIS4KNOHS6ZAU5OSYTLNOOH7U2Y5TW/

I ended up taking that one and rewrote it to suit my needs. I achieved >100x speedup in my case Good luck!

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Aggregating a dimension using the Quantiles method with `skipna=True` is very slow 1497031605
1351814726 https://github.com/pydata/xarray/issues/7377#issuecomment-1351814726 https://api.github.com/repos/pydata/xarray/issues/7377 IC_kwDOAMm_X85Qkw5G maawoo 56583917 2022-12-14T17:23:10Z 2022-12-14T17:23:10Z CONTRIBUTOR

This issue has an extra layer of evilness because users will also run into this issue when they don't specify the skipna parameter and their data is a float dtype, like in my example dummy data: da.quantile(0.95, dim='time')

The documentation could be a little bit clearer. The fact that skipna=True is the default for float dtypes could easily be overlooked in my opinion:

If True, skip missing values (as marked by NaN). By default, only skips missing values for float dtypes;

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Aggregating a dimension using the Quantiles method with `skipna=True` is very slow 1497031605

Advanced export

JSON shape: default, array, newline-delimited, object

CSV options:

CREATE TABLE [issue_comments] (
   [html_url] TEXT,
   [issue_url] TEXT,
   [id] INTEGER PRIMARY KEY,
   [node_id] TEXT,
   [user] INTEGER REFERENCES [users]([id]),
   [created_at] TEXT,
   [updated_at] TEXT,
   [author_association] TEXT,
   [body] TEXT,
   [reactions] TEXT,
   [performed_via_github_app] TEXT,
   [issue] INTEGER REFERENCES [issues]([id])
);
CREATE INDEX [idx_issue_comments_issue]
    ON [issue_comments] ([issue]);
CREATE INDEX [idx_issue_comments_user]
    ON [issue_comments] ([user]);
Powered by Datasette · Queries took 13.494ms · About: xarray-datasette