issues: 117039129
This data as json
| id | node_id | number | title | user | state | locked | assignee | milestone | comments | created_at | updated_at | closed_at | author_association | active_lock_reason | draft | pull_request | body | reactions | performed_via_github_app | state_reason | repo | type |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 117039129 | MDU6SXNzdWUxMTcwMzkxMjk= | 659 | groupby very slow compared to pandas | 1322974 | closed | 0 | 9 | 2015-11-16T02:43:57Z | 2022-05-15T02:38:30Z | 2022-05-15T02:38:30Z | CONTRIBUTOR | ``` import timeit import numpy as np from pandas import DataFrame from xray import Dataset, DataArray df = DataFrame({"a": np.r_[np.arange(500.), np.arange(500.)], "b": np.arange(1000.)}) print(timeit.repeat('df.groupby("a").agg("mean")', globals={"df": df}, number=10)) print(timeit.repeat('df.groupby("a").agg(np.mean)', globals={"df": df, "np": np}, number=10)) ds = Dataset({"a": DataArray(np.r_[np.arange(500.), np.arange(500.)]), "b": DataArray(np.arange(1000.))}) print(timeit.repeat('ds.groupby("a").mean()', globals={"ds": ds}, number=10)) ``` This outputs
i.e. xray's groupby is ~100 times slower than pandas' one (and 200 times slower than passing (This is the actual order or magnitude of the data size and redundancy I want to handle, i.e. thousands of points with very limited duplication.) |
{
"url": "https://api.github.com/repos/pydata/xarray/issues/659/reactions",
"total_count": 0,
"+1": 0,
"-1": 0,
"laugh": 0,
"hooray": 0,
"confused": 0,
"heart": 0,
"rocket": 0,
"eyes": 0
} |
completed | 13221727 | issue |