issue_comments
8 rows where issue = 711626733 sorted by updated_at descending
This data as json, CSV (advanced)
Suggested facets: created_at (date), updated_at (date)
issue 1
- Wrap numpy-groupies to speed up Xarray's groupby aggregations · 8 ✖
id | html_url | issue_url | node_id | user | created_at | updated_at ▲ | author_association | body | reactions | performed_via_github_app | issue |
---|---|---|---|---|---|---|---|---|---|---|---|
1046935567 | https://github.com/pydata/xarray/issues/4473#issuecomment-1046935567 | https://api.github.com/repos/pydata/xarray/issues/4473 | IC_kwDOAMm_X84-ZvgP | bmorris3 3497584 | 2022-02-21T14:25:47Z | 2022-02-21T14:25:47Z | NONE | Hi @shoyer, thanks for this neat trick! What happens when |
{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
Wrap numpy-groupies to speed up Xarray's groupby aggregations 711626733 | |
711461331 | https://github.com/pydata/xarray/issues/4473#issuecomment-711461331 | https://api.github.com/repos/pydata/xarray/issues/4473 | MDEyOklzc3VlQ29tbWVudDcxMTQ2MTMzMQ== | shoyer 1217238 | 2020-10-19T01:30:48Z | 2020-10-19T01:30:48Z | MEMBER |
I think we can reuse the existing logic from the This just gives us an alternative way to calculate
Agreed. Hopefully this can live alongside in the GroupBy objects.
Yes, I agree that we should do this incrementally. |
{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
Wrap numpy-groupies to speed up Xarray's groupby aggregations 711626733 | |
711460703 | https://github.com/pydata/xarray/issues/4473#issuecomment-711460703 | https://api.github.com/repos/pydata/xarray/issues/4473 | MDEyOklzc3VlQ29tbWVudDcxMTQ2MDcwMw== | shoyer 1217238 | 2020-10-19T01:27:50Z | 2020-10-19T01:27:50Z | MEMBER | Something like the resample test case from https://github.com/pydata/xarray/issues/4498 might be a good example for finding 100x speed-ups. The main feature of that case is that there are a very large number of groups (only slightly fewer groups than original data points). |
{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
Wrap numpy-groupies to speed up Xarray's groupby aggregations 711626733 | |
711458608 | https://github.com/pydata/xarray/issues/4473#issuecomment-711458608 | https://api.github.com/repos/pydata/xarray/issues/4473 | MDEyOklzc3VlQ29tbWVudDcxMTQ1ODYwOA== | max-sixty 5635139 | 2020-10-19T01:19:08Z | 2020-10-19T01:19:08Z | MEMBER | Here's a very quick POC: ```python from numpy_groupies.aggregate_numba import aggregate def npg_groupby(da: xr.DataArray, dim, func='sum'): group_idx, labels = pd.factorize(da.indexes[dim]) axis = da.get_axis_num(dim) array = npg.aggregate(group_idx=group_idx, a=da, func=func, axis=axis) return array ``` Run on this array: ```python size_factor = 1000 da = xr.DataArray( np.arange(1440 * size_factor).reshape(45 * size_factor, 8, 4), dims=("x", "y", "z"), coords=dict(x=list(range(45)) * size_factor, y=[1, 2, 3, 4] * 2, z=[1, 2] * 2), ) ``` It's about 2x as fast, though only generates the numpy array: ```python %%timeit npg_groupby(da, 'x') 15 ms ± 130 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)``` ```python %%timeit da.groupby('x').sum() 37.6 ms ± 244 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)``` Any thoughts on any of:
- What's the best way of reconstituting the coords etc, after npg produces the array?
- Presumably we're going to have a fairly different design for this than the existing groupby operations — that design is very nested — wrapping functions and eventually calling |
{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
Wrap numpy-groupies to speed up Xarray's groupby aggregations 711626733 | |
702339435 | https://github.com/pydata/xarray/issues/4473#issuecomment-702339435 | https://api.github.com/repos/pydata/xarray/issues/4473 | MDEyOklzc3VlQ29tbWVudDcwMjMzOTQzNQ== | max-sixty 5635139 | 2020-10-01T19:07:39Z | 2020-10-01T19:07:39Z | MEMBER |
Great. I need to think through how to do that — the approach of using MultiIndex |
{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
Wrap numpy-groupies to speed up Xarray's groupby aggregations 711626733 | |
701961598 | https://github.com/pydata/xarray/issues/4473#issuecomment-701961598 | https://api.github.com/repos/pydata/xarray/issues/4473 | MDEyOklzc3VlQ29tbWVudDcwMTk2MTU5OA== | shoyer 1217238 | 2020-10-01T07:57:58Z | 2020-10-01T07:57:58Z | MEMBER |
I'm not entirely sure, but I suspect something like the approach in https://github.com/pydata/xarray/pull/4184 might be more directly relevant for speeding up |
{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
Wrap numpy-groupies to speed up Xarray's groupby aggregations 711626733 | |
701653835 | https://github.com/pydata/xarray/issues/4473#issuecomment-701653835 | https://api.github.com/repos/pydata/xarray/issues/4473 | MDEyOklzc3VlQ29tbWVudDcwMTY1MzgzNQ== | max-sixty 5635139 | 2020-09-30T21:22:40Z | 2020-10-01T05:56:13Z | MEMBER | This looks amazing! Thanks for finding it. Highly speculative, but would this also be a faster approach to stacking & unstacking? "Form ~5~ 4" in the readme. |
{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
Wrap numpy-groupies to speed up Xarray's groupby aggregations 711626733 | |
701609035 | https://github.com/pydata/xarray/issues/4473#issuecomment-701609035 | https://api.github.com/repos/pydata/xarray/issues/4473 | MDEyOklzc3VlQ29tbWVudDcwMTYwOTAzNQ== | shoyer 1217238 | 2020-09-30T19:52:05Z | 2020-09-30T19:52:05Z | MEMBER | A prototype implementation of the core functionality here can be found in: https://nbviewer.jupyter.org/gist/shoyer/6d6c82bbf383fb717cc8631869678737 |
{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
Wrap numpy-groupies to speed up Xarray's groupby aggregations 711626733 |
Advanced export
JSON shape: default, array, newline-delimited, object
CREATE TABLE [issue_comments] ( [html_url] TEXT, [issue_url] TEXT, [id] INTEGER PRIMARY KEY, [node_id] TEXT, [user] INTEGER REFERENCES [users]([id]), [created_at] TEXT, [updated_at] TEXT, [author_association] TEXT, [body] TEXT, [reactions] TEXT, [performed_via_github_app] TEXT, [issue] INTEGER REFERENCES [issues]([id]) ); CREATE INDEX [idx_issue_comments_issue] ON [issue_comments] ([issue]); CREATE INDEX [idx_issue_comments_user] ON [issue_comments] ([user]);
user 3