github: issue_comments: where issue = 117039129 and user = 1217238 sorted by updated

where issue = 117039129 and user = 1217238 sorted by updated_at descending

Search:

descending

id	html_url	issue_url	node_id	user	created_at	updated_at ▲	author_association	body	reactions	issue
200417621	https://github.com/pydata/xarray/issues/659#issuecomment-200417621	https://api.github.com/repos/pydata/xarray/issues/659	MDEyOklzc3VlQ29tbWVudDIwMDQxNzYyMQ==	shoyer 1217238	2016-03-23T16:13:32Z	2016-03-23T16:13:32Z	MEMBER	Another approach here (rather than writing something new with Numba) would be to write a pure NumPy engine for groupby that relies on reordering data and `np.add.accumulate`. This could yield performance within a factor of 2-3x slower than pandas. See this comment for an example: https://github.com/numpy/numpy/issues/7265#issuecomment-198796408	{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	groupby very slow compared to pandas 117039129
157130467	https://github.com/pydata/xarray/issues/659#issuecomment-157130467	https://api.github.com/repos/pydata/xarray/issues/659	MDEyOklzc3VlQ29tbWVudDE1NzEzMDQ2Nw==	shoyer 1217238	2015-11-16T18:37:51Z	2015-11-16T18:37:51Z	MEMBER	Agreed! If you'd like to make a pull request that would be greatly appreciated On Sun, Nov 15, 2015 at 10:10 PM, Antony Lee notifications@github.com wrote: Perhaps worth mentioning in the docs? The difference turned out to be a major bottleneck in my code. — Reply to this email directly or view it on GitHub https://github.com/xray/xray/issues/659#issuecomment-156925589.	{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	groupby very slow compared to pandas 117039129
156921310	https://github.com/pydata/xarray/issues/659#issuecomment-156921310	https://api.github.com/repos/pydata/xarray/issues/659	MDEyOklzc3VlQ29tbWVudDE1NjkyMTMxMA==	shoyer 1217238	2015-11-16T05:40:09Z	2015-11-16T05:40:09Z	MEMBER	Yes, switching to pandas for these operations is certainly a recommended approach :).	{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	groupby very slow compared to pandas 117039129
156915727	https://github.com/pydata/xarray/issues/659#issuecomment-156915727	https://api.github.com/repos/pydata/xarray/issues/659	MDEyOklzc3VlQ29tbWVudDE1NjkxNTcyNw==	shoyer 1217238	2015-11-16T04:57:24Z	2015-11-16T04:57:24Z	MEMBER	Yes, I'm afraid this is a known issue. Grouped aggregations are currently implemented with a loop in pure Python, which, of course, is pretty slow. I've done some exploratory work to rewrite them in Numba, which shows some encouraging preliminary results: ``` from numba import guvectorize, jit import pandas as pd import numpy as np @guvectorize(['(float64[:], int64[:], float64[:])'], '(x),(x),(y)', nopython=True) def _grouped_mean(values, int_labels, target): count = np.zeros(len(target), np.int64) for i in range(len(values)): val = values[i] if not np.isnan(val): lab = int_labels[i] target[lab] += val count[lab] += 1 target /= count def move_axis_to_end(array, axis): array = np.asarray(array) return np.rollaxis(array, axis, start=array.ndim) def grouped_mean(values, by, axis=-1): int_labels, uniques = pd.factorize(by, sort=True) values = move_axis_to_end(values, axis) target = np.zeros(values.shape[:-1] + uniques.shape) _grouped_mean(values, int_labels, target) return target, uniques values = np.random.RandomState(0).rand(int(1e6)) values[::50] = np.nan by = np.random.randint(50, size=int(1e6)) df = pd.DataFrame({'x': values, 'y': by}) np.testing.assert_allclose(grouped_mean(values, by)[0], df.groupby('y')['x'].mean()) %timeit grouped_mean(values, by) # 100 loops, best of 3: 15.3 ms per loop %timeit df.groupby('y').mean() # 10 loops, best of 3: 21.4 ms per loop ``` Unfortunately, I'm unlikely to have time to work on this in the near future. If you or anyone else is interested in taking the lead on this, it would be greatly appreciated! Note that we can't reuse the routines from pandas because they are only designed for 1D or at most 2D data.	{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	groupby very slow compared to pandas 117039129

Advanced export

JSON shape: default, array, newline-delimited, object

CREATE TABLE [issue_comments] (
   [html_url] TEXT,
   [issue_url] TEXT,
   [id] INTEGER PRIMARY KEY,
   [node_id] TEXT,
   [user] INTEGER REFERENCES [users]([id]),
   [created_at] TEXT,
   [updated_at] TEXT,
   [author_association] TEXT,
   [body] TEXT,
   [reactions] TEXT,
   [performed_via_github_app] TEXT,
   [issue] INTEGER REFERENCES [issues]([id])
);
CREATE INDEX [idx_issue_comments_issue]
    ON [issue_comments] ([issue]);
CREATE INDEX [idx_issue_comments_user]
    ON [issue_comments] ([user]);