html_url,issue_url,id,node_id,user,created_at,updated_at,author_association,body,reactions,performed_via_github_app,issue https://github.com/pydata/xarray/issues/4610#issuecomment-1425198851,https://api.github.com/repos/pydata/xarray/issues/4610,1425198851,IC_kwDOAMm_X85U8s8D,2448579,2023-02-10T05:38:13Z,2023-02-10T05:38:31Z,MEMBER,"> Absolute speed of xhistogram appears to be 3-4x higher, and that's using numpy_groupies in flox. Possibly flox could be faster if using numba but not sure yet. Nah, in my experience, the overhead is ""factorizing"" (pd.cut/np.digitize) or converting to integer bins, and then converting the nD problem to a 1D problem for bincount. numba doesn't really help. ----- 3-4x is a lot bigger than I expected. I was hoping for under 2x because flox is more general. I think the problem is `pandas.cut` is a lot slower than `np.digitize` <img width=""632"" alt=""image"" src=""https://user-images.githubusercontent.com/2448579/218008671-86e36d51-cb41-4c23-984e-fae719642849.png""> We could swap that out easily here: https://github.com/xarray-contrib/flox/blob/daebc868c13dad74a55d74f3e5d24e0f6bbbc118/flox/core.py#L473 I think the one special case to consider is binning datetimes, and that digitize and pd.cut have different defaults for `side` or `closed`. ----- > Dask graphs simplicity. Xhistogram literally uses blockwise, whereas the flox graphs IIUC are blockwise-like but actually a specially-constructed HLG right now. ( `blockwise` and `sum`. Ideally`flox` would use a `reduction` that takes 2 array arguments (array to reduce, array to group by). Currently both [cubed](https://tom-e-white.com/cubed/operations.html#reduction-and-arg-reduction) and [dask](https://docs.dask.org/en/stable/generated/dask.array.reduction.html) onlt accept one argument. As a workaround, we could replace `dask.array._tree_reduce` with `dask.array.reduction(chunk=lambda x: x, ...)` and then it would more or less all be public API that is common to dask and cubed. > Flox has various clever schemes for making general chunked groupby operations run more efficiently, but I don't think histogramming would really benefit from those unless there is a strong pattern to which values likely fall in which bins, that is known a priori. Yup. unlikely to help here. ","{""total_count"": 1, ""+1"": 1, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,750985364