home / github / issue_comments

Menu
  • GraphQL API
  • Search all tables

issue_comments: 1425198851

This data as json

html_url issue_url id node_id user created_at updated_at author_association body reactions performed_via_github_app issue
https://github.com/pydata/xarray/issues/4610#issuecomment-1425198851 https://api.github.com/repos/pydata/xarray/issues/4610 1425198851 IC_kwDOAMm_X85U8s8D 2448579 2023-02-10T05:38:13Z 2023-02-10T05:38:31Z MEMBER

Absolute speed of xhistogram appears to be 3-4x higher, and that's using numpy_groupies in flox. Possibly flox could be faster if using numba but not sure yet.

Nah, in my experience, the overhead is "factorizing" (pd.cut/np.digitize) or converting to integer bins, and then converting the nD problem to a 1D problem for bincount. numba doesn't really help.


3-4x is a lot bigger than I expected. I was hoping for under 2x because flox is more general.

I think the problem is pandas.cut is a lot slower than np.digitize

We could swap that out easily here: https://github.com/xarray-contrib/flox/blob/daebc868c13dad74a55d74f3e5d24e0f6bbbc118/flox/core.py#L473

I think the one special case to consider is binning datetimes, and that digitize and pd.cut have different defaults for side or closed.


Dask graphs simplicity. Xhistogram literally uses blockwise, whereas the flox graphs IIUC are blockwise-like but actually a specially-constructed HLG right now. (

blockwise and sum.

Ideallyflox would use a reduction that takes 2 array arguments (array to reduce, array to group by). Currently both cubed and dask onlt accept one argument.

As a workaround, we could replace dask.array._tree_reduce with dask.array.reduction(chunk=lambda x: x, ...) and then it would more or less all be public API that is common to dask and cubed.

Flox has various clever schemes for making general chunked groupby operations run more efficiently, but I don't think histogramming would really benefit from those unless there is a strong pattern to which values likely fall in which bins, that is known a priori.

Yup. unlikely to help here.

{
    "total_count": 1,
    "+1": 1,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  750985364
Powered by Datasette · Queries took 0.635ms · About: xarray-datasette