issue_comments
8 rows where author_association = "MEMBER", issue = 416962458 and user = 5635139 sorted by updated_at descending
This data as json, CSV (advanced)
Suggested facets: reactions, created_at (date), updated_at (date)
issue 1
- Performance: numpy indexes small amounts of data 1000 faster than xarray · 8 ✖
id | html_url | issue_url | node_id | user | created_at | updated_at ▲ | author_association | body | reactions | performed_via_github_app | issue |
---|---|---|---|---|---|---|---|---|---|---|---|
553601146 | https://github.com/pydata/xarray/issues/2799#issuecomment-553601146 | https://api.github.com/repos/pydata/xarray/issues/2799 | MDEyOklzc3VlQ29tbWVudDU1MzYwMTE0Ng== | max-sixty 5635139 | 2019-11-13T21:03:23Z | 2019-11-13T21:03:23Z | MEMBER | That's great that's helpful @nbren12 . Maybe we should add to docs (we don't really have a performance section at the moment, maybe we start something on performance tips?) There's some info on the differences in the Terminology that @gwgundersen wrote: https://github.com/pydata/xarray/blob/master/doc/terminology.rst#L18 Essentially: by indexing on the variable, you ignore the coordinates, and so skip a bunch of code that takes the object apart and puts it back together. A variable is much more similar to a numpy array, so you can't do |
{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
Performance: numpy indexes small amounts of data 1000 faster than xarray 416962458 | |
552714604 | https://github.com/pydata/xarray/issues/2799#issuecomment-552714604 | https://api.github.com/repos/pydata/xarray/issues/2799 | MDEyOklzc3VlQ29tbWVudDU1MjcxNDYwNA== | max-sixty 5635139 | 2019-11-12T03:10:39Z | 2019-11-12T03:10:39Z | MEMBER | One note: if you're indexing into a dataarray and don't care about the coords, index into the variable. 2x numpy time, rather than 30x: ```python In [26]: da = xr.tutorial.open_dataset('air_temperature')['air'] In [27]: da Out[27]: <xarray.DataArray 'air' (time: 2920, lat: 25, lon: 53)> [3869000 values with dtype=float32] Coordinates: * lat (lat) float32 75.0 72.5 70.0 67.5 65.0 ... 25.0 22.5 20.0 17.5 15.0 * lon (lon) float32 200.0 202.5 205.0 207.5 ... 322.5 325.0 327.5 330.0 * time (time) datetime64[ns] 2013-01-01 ... 2014-12-31T18:00:00 Attributes: long_name: 4xDaily Air temperature at sigma level 995 units: degK precision: 2 GRIB_id: 11 GRIB_name: TMP var_desc: Air temperature dataset: NMC Reanalysis level_desc: Surface statistic: Individual Obs parent_stat: Other actual_range: [185.16 322.1 ] In [20]: %timeit da.variable[0] 28.2 µs ± 2.29 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each) In [21]: %timeit da[0] 459 µs ± 37.9 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each) In [22]: %timeit da.variable.values[0] 14.1 µs ± 183 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each) ``` |
{ "total_count": 2, "+1": 2, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
Performance: numpy indexes small amounts of data 1000 faster than xarray 416962458 | |
552646381 | https://github.com/pydata/xarray/issues/2799#issuecomment-552646381 | https://api.github.com/repos/pydata/xarray/issues/2799 | MDEyOklzc3VlQ29tbWVudDU1MjY0NjM4MQ== | max-sixty 5635139 | 2019-11-11T22:29:58Z | 2019-11-11T22:29:58Z | MEMBER | TBC I think there's plenty we could do with relatively little complexity to speed up indexing operations on That's a different problem from making xarray as fast as indexing a numpy array, or allowing libraries to iterate through a |
{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
Performance: numpy indexes small amounts of data 1000 faster than xarray 416962458 | |
539100243 | https://github.com/pydata/xarray/issues/2799#issuecomment-539100243 | https://api.github.com/repos/pydata/xarray/issues/2799 | MDEyOklzc3VlQ29tbWVudDUzOTEwMDI0Mw== | max-sixty 5635139 | 2019-10-07T16:39:54Z | 2019-10-07T16:39:54Z | MEMBER | Great analysis, thanks Do we have any idea of which of those lines are offending? I used a tool |
{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
Performance: numpy indexes small amounts of data 1000 faster than xarray 416962458 | |
469898607 | https://github.com/pydata/xarray/issues/2799#issuecomment-469898607 | https://api.github.com/repos/pydata/xarray/issues/2799 | MDEyOklzc3VlQ29tbWVudDQ2OTg5ODYwNw== | max-sixty 5635139 | 2019-03-05T23:16:43Z | 2019-03-05T23:16:43Z | MEMBER |
Right, tbc, I'm only referring to the top two lines of the pasted benchmark; i.e. once we enter python (even if only to access a numpy array) we're already losing a lot of the speed relative to the loop staying in C / Cython. So even if xarray were a python front-end to a C++ library, it still wouldn't be competitive if performance were paramount. ...unless pypy sped that up; I'd be v interested to see. |
{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
Performance: numpy indexes small amounts of data 1000 faster than xarray 416962458 | |
469861382 | https://github.com/pydata/xarray/issues/2799#issuecomment-469861382 | https://api.github.com/repos/pydata/xarray/issues/2799 | MDEyOklzc3VlQ29tbWVudDQ2OTg2MTM4Mg== | max-sixty 5635139 | 2019-03-05T21:19:31Z | 2019-03-05T21:19:31Z | MEMBER | To put the relative speed of numpy access into perspective, I found this insightful: https://jakevdp.github.io/blog/2012/08/08/memoryview-benchmarks/ (it's now a few years out of date, but I think the fundamentals still stand) Pasted from there:
So if we're running an inner loop on an array, accessing it using numpy in python is an order of magnitude slower than accessing it using numpy in C (and that's an order of magnitude slower than using a slice, and that's an order of magnitude slower than using raw pointers) So - let's definitely speed xarray up (your benchmarks are excellent, thank you again, and I think you're right there are opportunities for significant increases). But where speed is paramount above all else, we shouldn't use any access in python, let alone the niceties of xarray access. |
{ "total_count": 1, "+1": 1, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
Performance: numpy indexes small amounts of data 1000 faster than xarray 416962458 | |
469449165 | https://github.com/pydata/xarray/issues/2799#issuecomment-469449165 | https://api.github.com/repos/pydata/xarray/issues/2799 | MDEyOklzc3VlQ29tbWVudDQ2OTQ0OTE2NQ== | max-sixty 5635139 | 2019-03-04T22:33:03Z | 2019-03-04T22:33:03Z | MEMBER | You can always use xarray to process the data, and then extract the underlying array ( |
{ "total_count": 1, "+1": 1, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
Performance: numpy indexes small amounts of data 1000 faster than xarray 416962458 | |
469445483 | https://github.com/pydata/xarray/issues/2799#issuecomment-469445483 | https://api.github.com/repos/pydata/xarray/issues/2799 | MDEyOklzc3VlQ29tbWVudDQ2OTQ0NTQ4Mw== | max-sixty 5635139 | 2019-03-04T22:20:58Z | 2019-03-04T22:20:58Z | MEMBER | Thanks for the benchmarks @nbren12, and for the clear explanation @shoyer While we could do some performance work on that loop, I think we're likely to see a material change by enabling the external library to access directly from the array, without a looped python call. That's consistent with the ideas @jhamman had a few days ago. |
{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
Performance: numpy indexes small amounts of data 1000 faster than xarray 416962458 |
Advanced export
JSON shape: default, array, newline-delimited, object
CREATE TABLE [issue_comments] ( [html_url] TEXT, [issue_url] TEXT, [id] INTEGER PRIMARY KEY, [node_id] TEXT, [user] INTEGER REFERENCES [users]([id]), [created_at] TEXT, [updated_at] TEXT, [author_association] TEXT, [body] TEXT, [reactions] TEXT, [performed_via_github_app] TEXT, [issue] INTEGER REFERENCES [issues]([id]) ); CREATE INDEX [idx_issue_comments_issue] ON [issue_comments] ([issue]); CREATE INDEX [idx_issue_comments_user] ON [issue_comments] ([user]);
user 1