issue_comments
9 rows where issue = 117039129 sorted by updated_at descending
This data as json, CSV (advanced)
Suggested facets: reactions, created_at (date), updated_at (date)
issue 1
- groupby very slow compared to pandas · 9 ✖
id | html_url | issue_url | node_id | user | created_at | updated_at ▲ | author_association | body | reactions | performed_via_github_app | issue |
---|---|---|---|---|---|---|---|---|---|---|---|
1126083413 | https://github.com/pydata/xarray/issues/659#issuecomment-1126083413 | https://api.github.com/repos/pydata/xarray/issues/659 | IC_kwDOAMm_X85DHqtV | andersy005 13301940 | 2022-05-13T13:55:20Z | 2022-05-13T13:55:20Z | MEMBER | 5734 has greatly improved the performance. Fantastic work @dcherian 👏🏽```python In [13]: import xarray as xr, pandas as pd, numpy as np In [14]: ds = xr.Dataset({"a": xr.DataArray(np.r_[np.arange(500.), np.arange(500.)]), ...: "b": xr.DataArray(np.arange(1000.))}) In [15]: ds Out[15]: <xarray.Dataset> Dimensions: (dim_0: 1000) Dimensions without coordinates: dim_0 Data variables: a (dim_0) float64 0.0 1.0 2.0 3.0 4.0 ... 496.0 497.0 498.0 499.0 b (dim_0) float64 0.0 1.0 2.0 3.0 4.0 ... 996.0 997.0 998.0 999.0 ``` ```python In [16]: xr.set_options(use_flox=True) Out[16]: <xarray.core.options.set_options at 0x104de21a0> In [17]: %%timeit ...: ds.groupby("a").mean() ...: ...: 1.5 ms ± 3.3 µs per loop (mean ± std. dev. of 7 runs, 1,000 loops each) In [18]: xr.set_options(use_flox=False) Out[18]: <xarray.core.options.set_options at 0x144382350> In [19]: %%timeit ...: ds.groupby("a").mean() ...: ...: 94 ms ± 715 µs per loop (mean ± std. dev. of 7 runs, 10 loops each) ``` |
{ "total_count": 4, "+1": 0, "-1": 0, "laugh": 0, "hooray": 4, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
groupby very slow compared to pandas 117039129 | |
522592263 | https://github.com/pydata/xarray/issues/659#issuecomment-522592263 | https://api.github.com/repos/pydata/xarray/issues/659 | MDEyOklzc3VlQ29tbWVudDUyMjU5MjI2Mw== | lanougue 32069530 | 2019-08-19T14:09:36Z | 2019-08-19T14:09:36Z | NONE | { "total_count": 2, "+1": 2, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
groupby very slow compared to pandas 117039129 | ||
334212532 | https://github.com/pydata/xarray/issues/659#issuecomment-334212532 | https://api.github.com/repos/pydata/xarray/issues/659 | MDEyOklzc3VlQ29tbWVudDMzNDIxMjUzMg== | jjpr-mit 25231875 | 2017-10-04T16:27:21Z | 2017-10-04T16:27:21Z | NONE | In case anyone gets here by Googling something like "xarray groupby slow" and you loaded data from a netCDF file, be aware that slowness you see in groupby aggregation on a |
{ "total_count": 9, "+1": 6, "-1": 0, "laugh": 0, "hooray": 1, "confused": 0, "heart": 1, "rocket": 1, "eyes": 0 } |
groupby very slow compared to pandas 117039129 | |
200417621 | https://github.com/pydata/xarray/issues/659#issuecomment-200417621 | https://api.github.com/repos/pydata/xarray/issues/659 | MDEyOklzc3VlQ29tbWVudDIwMDQxNzYyMQ== | shoyer 1217238 | 2016-03-23T16:13:32Z | 2016-03-23T16:13:32Z | MEMBER | Another approach here (rather than writing something new with Numba) would be to write a pure NumPy engine for groupby that relies on reordering data and |
{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
groupby very slow compared to pandas 117039129 | |
157130467 | https://github.com/pydata/xarray/issues/659#issuecomment-157130467 | https://api.github.com/repos/pydata/xarray/issues/659 | MDEyOklzc3VlQ29tbWVudDE1NzEzMDQ2Nw== | shoyer 1217238 | 2015-11-16T18:37:51Z | 2015-11-16T18:37:51Z | MEMBER | Agreed! If you'd like to make a pull request that would be greatly appreciated On Sun, Nov 15, 2015 at 10:10 PM, Antony Lee notifications@github.com wrote:
|
{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
groupby very slow compared to pandas 117039129 | |
156925589 | https://github.com/pydata/xarray/issues/659#issuecomment-156925589 | https://api.github.com/repos/pydata/xarray/issues/659 | MDEyOklzc3VlQ29tbWVudDE1NjkyNTU4OQ== | anntzer 1322974 | 2015-11-16T06:10:25Z | 2015-11-16T06:10:25Z | CONTRIBUTOR | Perhaps worth mentioning in the docs? The difference turned out to be a major bottleneck in my code. |
{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
groupby very slow compared to pandas 117039129 | |
156921310 | https://github.com/pydata/xarray/issues/659#issuecomment-156921310 | https://api.github.com/repos/pydata/xarray/issues/659 | MDEyOklzc3VlQ29tbWVudDE1NjkyMTMxMA== | shoyer 1217238 | 2015-11-16T05:40:09Z | 2015-11-16T05:40:09Z | MEMBER | Yes, switching to pandas for these operations is certainly a recommended approach :). |
{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
groupby very slow compared to pandas 117039129 | |
156917053 | https://github.com/pydata/xarray/issues/659#issuecomment-156917053 | https://api.github.com/repos/pydata/xarray/issues/659 | MDEyOklzc3VlQ29tbWVudDE1NjkxNzA1Mw== | anntzer 1322974 | 2015-11-16T05:14:50Z | 2015-11-16T05:14:50Z | CONTRIBUTOR | In my case I could just switch to pandas, so I'll leave it as it is for now. |
{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
groupby very slow compared to pandas 117039129 | |
156915727 | https://github.com/pydata/xarray/issues/659#issuecomment-156915727 | https://api.github.com/repos/pydata/xarray/issues/659 | MDEyOklzc3VlQ29tbWVudDE1NjkxNTcyNw== | shoyer 1217238 | 2015-11-16T04:57:24Z | 2015-11-16T04:57:24Z | MEMBER | Yes, I'm afraid this is a known issue. Grouped aggregations are currently implemented with a loop in pure Python, which, of course, is pretty slow. I've done some exploratory work to rewrite them in Numba, which shows some encouraging preliminary results: ``` from numba import guvectorize, jit import pandas as pd import numpy as np @guvectorize(['(float64[:], int64[:], float64[:])'], '(x),(x),(y)', nopython=True) def _grouped_mean(values, int_labels, target): count = np.zeros(len(target), np.int64) for i in range(len(values)): val = values[i] if not np.isnan(val): lab = int_labels[i] target[lab] += val count[lab] += 1 target /= count def move_axis_to_end(array, axis): array = np.asarray(array) return np.rollaxis(array, axis, start=array.ndim) def grouped_mean(values, by, axis=-1): int_labels, uniques = pd.factorize(by, sort=True) values = move_axis_to_end(values, axis) target = np.zeros(values.shape[:-1] + uniques.shape) _grouped_mean(values, int_labels, target) return target, uniques values = np.random.RandomState(0).rand(int(1e6)) values[::50] = np.nan by = np.random.randint(50, size=int(1e6)) df = pd.DataFrame({'x': values, 'y': by}) np.testing.assert_allclose(grouped_mean(values, by)[0], df.groupby('y')['x'].mean()) %timeit grouped_mean(values, by) # 100 loops, best of 3: 15.3 ms per loop %timeit df.groupby('y').mean() # 10 loops, best of 3: 21.4 ms per loop ``` Unfortunately, I'm unlikely to have time to work on this in the near future. If you or anyone else is interested in taking the lead on this, it would be greatly appreciated! Note that we can't reuse the routines from pandas because they are only designed for 1D or at most 2D data. |
{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
groupby very slow compared to pandas 117039129 |
Advanced export
JSON shape: default, array, newline-delimited, object
CREATE TABLE [issue_comments] ( [html_url] TEXT, [issue_url] TEXT, [id] INTEGER PRIMARY KEY, [node_id] TEXT, [user] INTEGER REFERENCES [users]([id]), [created_at] TEXT, [updated_at] TEXT, [author_association] TEXT, [body] TEXT, [reactions] TEXT, [performed_via_github_app] TEXT, [issue] INTEGER REFERENCES [issues]([id]) ); CREATE INDEX [idx_issue_comments_issue] ON [issue_comments] ([issue]); CREATE INDEX [idx_issue_comments_user] ON [issue_comments] ([user]);
user 5