github: issue_comments: 8 rows where author_association = "MEMBER", issue = 416962458 and user = 5635139 sorted by updated

8 rows where author_association = "MEMBER", issue = 416962458 and user = 5635139 sorted by updated_at descending

Search:

descending

id	html_url	issue_url	node_id	user	created_at	updated_at ▲	author_association	body	reactions	issue
553601146	https://github.com/pydata/xarray/issues/2799#issuecomment-553601146	https://api.github.com/repos/pydata/xarray/issues/2799	MDEyOklzc3VlQ29tbWVudDU1MzYwMTE0Ng==	max-sixty 5635139	2019-11-13T21:03:23Z	2019-11-13T21:03:23Z	MEMBER	That's great that's helpful @nbren12 . Maybe we should add to docs (we don't really have a performance section at the moment, maybe we start something on performance tips?) There's some info on the differences in the Terminology that @gwgundersen wrote: https://github.com/pydata/xarray/blob/master/doc/terminology.rst#L18 Essentially: by indexing on the variable, you ignore the coordinates, and so skip a bunch of code that takes the object apart and puts it back together. A variable is much more similar to a numpy array, so you can't do `sel`, for example.	{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	Performance: numpy indexes small amounts of data 1000 faster than xarray 416962458
552714604	https://github.com/pydata/xarray/issues/2799#issuecomment-552714604	https://api.github.com/repos/pydata/xarray/issues/2799	MDEyOklzc3VlQ29tbWVudDU1MjcxNDYwNA==	max-sixty 5635139	2019-11-12T03:10:39Z	2019-11-12T03:10:39Z	MEMBER	One note: if you're indexing into a dataarray and don't care about the coords, index into the variable. 2x numpy time, rather than 30x: ```python In [26]: da = xr.tutorial.open_dataset('air_temperature')['air'] In [27]: da Out[27]: <xarray.DataArray 'air' (time: 2920, lat: 25, lon: 53)> [3869000 values with dtype=float32] Coordinates: * lat (lat) float32 75.0 72.5 70.0 67.5 65.0 ... 25.0 22.5 20.0 17.5 15.0 * lon (lon) float32 200.0 202.5 205.0 207.5 ... 322.5 325.0 327.5 330.0 * time (time) datetime64[ns] 2013-01-01 ... 2014-12-31T18:00:00 Attributes: long_name: 4xDaily Air temperature at sigma level 995 units: degK precision: 2 GRIB_id: 11 GRIB_name: TMP var_desc: Air temperature dataset: NMC Reanalysis level_desc: Surface statistic: Individual Obs parent_stat: Other actual_range: [185.16 322.1 ] In [20]: %timeit da.variable[0] 28.2 µs ± 2.29 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each) In [21]: %timeit da[0] 459 µs ± 37.9 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each) In [22]: %timeit da.variable.values[0] 14.1 µs ± 183 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each) ```	{ "total_count": 2, "+1": 2, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	Performance: numpy indexes small amounts of data 1000 faster than xarray 416962458
552646381	https://github.com/pydata/xarray/issues/2799#issuecomment-552646381	https://api.github.com/repos/pydata/xarray/issues/2799	MDEyOklzc3VlQ29tbWVudDU1MjY0NjM4MQ==	max-sixty 5635139	2019-11-11T22:29:58Z	2019-11-11T22:29:58Z	MEMBER	TBC I think there's plenty we could do with relatively little complexity to speed up indexing operations on `DataArray`s. As an example, we could avoid the roundtrip to a temporary `Dataset`. That's a different problem from making xarray as fast as indexing a numpy array, or allowing libraries to iterate through a `DataArray` in a hot loop.	{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	Performance: numpy indexes small amounts of data 1000 faster than xarray 416962458
539100243	https://github.com/pydata/xarray/issues/2799#issuecomment-539100243	https://api.github.com/repos/pydata/xarray/issues/2799	MDEyOklzc3VlQ29tbWVudDUzOTEwMDI0Mw==	max-sixty 5635139	2019-10-07T16:39:54Z	2019-10-07T16:39:54Z	MEMBER	Great analysis, thanks Do we have any idea of which of those lines are offending? I used a tool `line_profiler` a while ago, but maybe we know already (I'm guessing it's the two `_replace_with_new_dims` lines?)	{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	Performance: numpy indexes small amounts of data 1000 faster than xarray 416962458
469898607	https://github.com/pydata/xarray/issues/2799#issuecomment-469898607	https://api.github.com/repos/pydata/xarray/issues/2799	MDEyOklzc3VlQ29tbWVudDQ2OTg5ODYwNw==	max-sixty 5635139	2019-03-05T23:16:43Z	2019-03-05T23:16:43Z	MEMBER	Cython + memoryviews isn't quite the right comparison here. Right, tbc, I'm only referring to the top two lines of the pasted benchmark; i.e. once we enter python (even if only to access a numpy array) we're already losing a lot of the speed relative to the loop staying in C / Cython. So even if xarray were a python front-end to a C++ library, it still wouldn't be competitive if performance were paramount. ...unless pypy sped that up; I'd be v interested to see.	{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	Performance: numpy indexes small amounts of data 1000 faster than xarray 416962458
469861382	https://github.com/pydata/xarray/issues/2799#issuecomment-469861382	https://api.github.com/repos/pydata/xarray/issues/2799	MDEyOklzc3VlQ29tbWVudDQ2OTg2MTM4Mg==	max-sixty 5635139	2019-03-05T21:19:31Z	2019-03-05T21:19:31Z	MEMBER	To put the relative speed of numpy access into perspective, I found this insightful: https://jakevdp.github.io/blog/2012/08/08/memoryview-benchmarks/ (it's now a few years out of date, but I think the fundamentals still stand) Pasted from there: Summary Here are the timing results we've seen above: Python + numpy: 6510 ms Cython + numpy: 668 ms Cython + memviews (slicing): 22 ms Cython + raw pointers: 2.47 ms Cython + memviews (no slicing): 2.45 ms So if we're running an inner loop on an array, accessing it using numpy in python is an order of magnitude slower than accessing it using numpy in C (and that's an order of magnitude slower than using a slice, and that's an order of magnitude slower than using raw pointers) So - let's definitely speed xarray up (your benchmarks are excellent, thank you again, and I think you're right there are opportunities for significant increases). But where speed is paramount above all else, we shouldn't use any access in python, let alone the niceties of xarray access.	{ "total_count": 1, "+1": 1, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	Performance: numpy indexes small amounts of data 1000 faster than xarray 416962458
469449165	https://github.com/pydata/xarray/issues/2799#issuecomment-469449165	https://api.github.com/repos/pydata/xarray/issues/2799	MDEyOklzc3VlQ29tbWVudDQ2OTQ0OTE2NQ==	max-sixty 5635139	2019-03-04T22:33:03Z	2019-03-04T22:33:03Z	MEMBER	You can always use xarray to process the data, and then extract the underlying array (`da.values`) for passing into something expecting an numpy array / for running fast(ish) loops (we do this frequently).	{ "total_count": 1, "+1": 1, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	Performance: numpy indexes small amounts of data 1000 faster than xarray 416962458
469445483	https://github.com/pydata/xarray/issues/2799#issuecomment-469445483	https://api.github.com/repos/pydata/xarray/issues/2799	MDEyOklzc3VlQ29tbWVudDQ2OTQ0NTQ4Mw==	max-sixty 5635139	2019-03-04T22:20:58Z	2019-03-04T22:20:58Z	MEMBER	Thanks for the benchmarks @nbren12, and for the clear explanation @shoyer While we could do some performance work on that loop, I think we're likely to see a material change by enabling the external library to access directly from the array, without a looped python call. That's consistent with the ideas @jhamman had a few days ago.	{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	Performance: numpy indexes small amounts of data 1000 faster than xarray 416962458

Advanced export

JSON shape: default, array, newline-delimited, object

CREATE TABLE [issue_comments] (
   [html_url] TEXT,
   [issue_url] TEXT,
   [id] INTEGER PRIMARY KEY,
   [node_id] TEXT,
   [user] INTEGER REFERENCES [users]([id]),
   [created_at] TEXT,
   [updated_at] TEXT,
   [author_association] TEXT,
   [body] TEXT,
   [reactions] TEXT,
   [performed_via_github_app] TEXT,
   [issue] INTEGER REFERENCES [issues]([id])
);
CREATE INDEX [idx_issue_comments_issue]
    ON [issue_comments] ([issue]);
CREATE INDEX [idx_issue_comments_user]
    ON [issue_comments] ([user]);