github: issue_comments: 1 row where author_association = "MEMBER", issue = 750985364 and user = 2448579 sorted by updated

1 row where author_association = "MEMBER", issue = 750985364 and user = 2448579 sorted by updated_at descending

Search:

descending

id	html_url	issue_url	node_id	user	created_at	updated_at ▲	author_association	body	reactions	performed_via_github_app	issue
1425198851	https://github.com/pydata/xarray/issues/4610#issuecomment-1425198851	https://api.github.com/repos/pydata/xarray/issues/4610	IC_kwDOAMm_X85U8s8D	dcherian 2448579	2023-02-10T05:38:13Z	2023-02-10T05:38:31Z	MEMBER	Absolute speed of xhistogram appears to be 3-4x higher, and that's using numpy_groupies in flox. Possibly flox could be faster if using numba but not sure yet. Nah, in my experience, the overhead is "factorizing" (pd.cut/np.digitize) or converting to integer bins, and then converting the nD problem to a 1D problem for bincount. numba doesn't really help. 3-4x is a lot bigger than I expected. I was hoping for under 2x because flox is more general. I think the problem is `pandas.cut` is a lot slower than `np.digitize` We could swap that out easily here: https://github.com/xarray-contrib/flox/blob/daebc868c13dad74a55d74f3e5d24e0f6bbbc118/flox/core.py#L473 I think the one special case to consider is binning datetimes, and that digitize and pd.cut have different defaults for `side` or `closed`. Dask graphs simplicity. Xhistogram literally uses blockwise, whereas the flox graphs IIUC are blockwise-like but actually a specially-constructed HLG right now. ( `blockwise` and `sum`. Ideally`flox` would use a `reduction` that takes 2 array arguments (array to reduce, array to group by). Currently both cubed and dask onlt accept one argument. As a workaround, we could replace `dask.array._tree_reduce` with `dask.array.reduction(chunk=lambda x: x, ...)` and then it would more or less all be public API that is common to dask and cubed. Flox has various clever schemes for making general chunked groupby operations run more efficiently, but I don't think histogramming would really benefit from those unless there is a strong pattern to which values likely fall in which bins, that is known a priori. Yup. unlikely to help here.	{ "total_count": 1, "+1": 1, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }		Add histogram method 750985364

Advanced export

JSON shape: default, array, newline-delimited, object

CREATE TABLE [issue_comments] (
   [html_url] TEXT,
   [issue_url] TEXT,
   [id] INTEGER PRIMARY KEY,
   [node_id] TEXT,
   [user] INTEGER REFERENCES [users]([id]),
   [created_at] TEXT,
   [updated_at] TEXT,
   [author_association] TEXT,
   [body] TEXT,
   [reactions] TEXT,
   [performed_via_github_app] TEXT,
   [issue] INTEGER REFERENCES [issues]([id])
);
CREATE INDEX [idx_issue_comments_issue]
    ON [issue_comments] ([issue]);
CREATE INDEX [idx_issue_comments_user]
    ON [issue_comments] ([user]);