home / github

Menu
  • GraphQL API
  • Search all tables

issue_comments

Table actions
  • GraphQL API for issue_comments

1 row where issue = 750985364 and user = 2448579 sorted by updated_at descending

✎ View and edit SQL

This data as json, CSV (advanced)

user 1

  • dcherian · 1 ✖

issue 1

  • Add histogram method · 1 ✖

author_association 1

  • MEMBER 1
id html_url issue_url node_id user created_at updated_at ▲ author_association body reactions performed_via_github_app issue
1425198851 https://github.com/pydata/xarray/issues/4610#issuecomment-1425198851 https://api.github.com/repos/pydata/xarray/issues/4610 IC_kwDOAMm_X85U8s8D dcherian 2448579 2023-02-10T05:38:13Z 2023-02-10T05:38:31Z MEMBER

Absolute speed of xhistogram appears to be 3-4x higher, and that's using numpy_groupies in flox. Possibly flox could be faster if using numba but not sure yet.

Nah, in my experience, the overhead is "factorizing" (pd.cut/np.digitize) or converting to integer bins, and then converting the nD problem to a 1D problem for bincount. numba doesn't really help.


3-4x is a lot bigger than I expected. I was hoping for under 2x because flox is more general.

I think the problem is pandas.cut is a lot slower than np.digitize

We could swap that out easily here: https://github.com/xarray-contrib/flox/blob/daebc868c13dad74a55d74f3e5d24e0f6bbbc118/flox/core.py#L473

I think the one special case to consider is binning datetimes, and that digitize and pd.cut have different defaults for side or closed.


Dask graphs simplicity. Xhistogram literally uses blockwise, whereas the flox graphs IIUC are blockwise-like but actually a specially-constructed HLG right now. (

blockwise and sum.

Ideallyflox would use a reduction that takes 2 array arguments (array to reduce, array to group by). Currently both cubed and dask onlt accept one argument.

As a workaround, we could replace dask.array._tree_reduce with dask.array.reduction(chunk=lambda x: x, ...) and then it would more or less all be public API that is common to dask and cubed.

Flox has various clever schemes for making general chunked groupby operations run more efficiently, but I don't think histogramming would really benefit from those unless there is a strong pattern to which values likely fall in which bins, that is known a priori.

Yup. unlikely to help here.

{
    "total_count": 1,
    "+1": 1,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Add histogram method 750985364

Advanced export

JSON shape: default, array, newline-delimited, object

CSV options:

CREATE TABLE [issue_comments] (
   [html_url] TEXT,
   [issue_url] TEXT,
   [id] INTEGER PRIMARY KEY,
   [node_id] TEXT,
   [user] INTEGER REFERENCES [users]([id]),
   [created_at] TEXT,
   [updated_at] TEXT,
   [author_association] TEXT,
   [body] TEXT,
   [reactions] TEXT,
   [performed_via_github_app] TEXT,
   [issue] INTEGER REFERENCES [issues]([id])
);
CREATE INDEX [idx_issue_comments_issue]
    ON [issue_comments] ([issue]);
CREATE INDEX [idx_issue_comments_user]
    ON [issue_comments] ([user]);
Powered by Datasette · Queries took 2350.265ms · About: xarray-datasette