issue_comments: 1425198851

This data as json

html_url	issue_url	id	node_id	user	created_at	updated_at	author_association	body	reactions	performed_via_github_app	issue
https://github.com/pydata/xarray/issues/4610#issuecomment-1425198851	https://api.github.com/repos/pydata/xarray/issues/4610	1425198851	IC_kwDOAMm_X85U8s8D	2448579	2023-02-10T05:38:13Z	2023-02-10T05:38:31Z	MEMBER	Absolute speed of xhistogram appears to be 3-4x higher, and that's using numpy_groupies in flox. Possibly flox could be faster if using numba but not sure yet. Nah, in my experience, the overhead is "factorizing" (pd.cut/np.digitize) or converting to integer bins, and then converting the nD problem to a 1D problem for bincount. numba doesn't really help. 3-4x is a lot bigger than I expected. I was hoping for under 2x because flox is more general. I think the problem is `pandas.cut` is a lot slower than `np.digitize` We could swap that out easily here: https://github.com/xarray-contrib/flox/blob/daebc868c13dad74a55d74f3e5d24e0f6bbbc118/flox/core.py#L473 I think the one special case to consider is binning datetimes, and that digitize and pd.cut have different defaults for `side` or `closed`. Dask graphs simplicity. Xhistogram literally uses blockwise, whereas the flox graphs IIUC are blockwise-like but actually a specially-constructed HLG right now. ( `blockwise` and `sum`. Ideally`flox` would use a `reduction` that takes 2 array arguments (array to reduce, array to group by). Currently both cubed and dask onlt accept one argument. As a workaround, we could replace `dask.array._tree_reduce` with `dask.array.reduction(chunk=lambda x: x, ...)` and then it would more or less all be public API that is common to dask and cubed. Flox has various clever schemes for making general chunked groupby operations run more efficiently, but I don't think histogramming would really benefit from those unless there is a strong pattern to which values likely fall in which bins, that is known a priori. Yup. unlikely to help here.	{ "total_count": 1, "+1": 1, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }		750985364