github: issue_comments: 6 rows where author_association = "MEMBER", issue = 416962458 and user = 6213168 sorted by updated

6 rows where author_association = "MEMBER", issue = 416962458 and user = 6213168 sorted by updated_at descending

Search:

descending

id	html_url	issue_url	node_id	user	created_at	updated_at ▲	author_association	body	reactions	issue
553948714	https://github.com/pydata/xarray/issues/2799#issuecomment-553948714	https://api.github.com/repos/pydata/xarray/issues/2799	MDEyOklzc3VlQ29tbWVudDU1Mzk0ODcxNA==	crusaderky 6213168	2019-11-14T15:50:35Z	2019-11-14T15:50:35Z	MEMBER	3533 closes the gap between DataArray and numpy from 500x slower to "just" 100x slower :)	{ "total_count": 2, "+1": 2, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	Performance: numpy indexes small amounts of data 1000 faster than xarray 416962458
539218376	https://github.com/pydata/xarray/issues/2799#issuecomment-539218376	https://api.github.com/repos/pydata/xarray/issues/2799	MDEyOklzc3VlQ29tbWVudDUzOTIxODM3Ng==	crusaderky 6213168	2019-10-07T21:46:32Z	2019-10-07T21:53:33Z	MEMBER	I tried playing around with pypy 3.6. Big fat disclaimer: I did not run any of the xarray unit tests. Expect trouble if you do. 1. ```bash !/bin/bash set -o errexit set -o pipefail set -o nounset set -o xtrace tar -xvjf Downloads/pypy3.6-v7.1.1-linux64.tar.bz2 cd pypy3.6-v7.1.1-linux64/bin ./pypy3 -m ensurepip ./pip3.6 install -U pip wheel ./pip list \| awk 'NR > 2 {print $1}' \| grep -v greenlet \| xargs ./pip install -U sudo apt-get install libopenblas-dev gfortran ./pip install numpy pandas xarray ` 2. to work around https://bitbucket.org/pypy/pypy/issues/3087/collectionsabc-__init_subclass__-failure, editxarray/core/common.py`and delete`AttrAccessMixin.init_subclass`` timeit is unreliable in pypy. I modified the benchmark as follows: ```python import time import numpy as np import xarray as xr shape = (10, 10, 10, 10) index = (0, 0, 0, 0) np_arr = np.ones(shape) arr = xr.DataArray(np_arr) N = 10000 def bench_slice(obj): for _ in range(4): t0 = time.time() for _ in range(N): obj[index] t1 = time.time() t_ns = (t1 - t0) / N * 1e9 print(f"{t_ns:6.0f} ns {obj.class.name}") bench_slice(arr) bench_slice(np_arr) ``` Benchmark outputs: CPython 3.7: `93496 ns DataArray 92732 ns DataArray 92560 ns DataArray 93427 ns DataArray 119 ns ndarray 121 ns ndarray 122 ns ndarray 119 ns ndarray` PyPy 7.1 3.6: `113273 ns DataArray 38543 ns DataArray 34797 ns DataArray 39453 ns DataArray 386 ns ndarray 289 ns ndarray 329 ns ndarray 413 ns ndarray` Big important reminder: all results are for a very small array. I would expect the gap between CPython and pypy to get narrower in % (both for numpy and xarray) as the array size gets larger and more time is spent in the pure C numpy code.	{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	Performance: numpy indexes small amounts of data 1000 faster than xarray 416962458
538570946	https://github.com/pydata/xarray/issues/2799#issuecomment-538570946	https://api.github.com/repos/pydata/xarray/issues/2799	MDEyOklzc3VlQ29tbWVudDUzODU3MDk0Ng==	crusaderky 6213168	2019-10-04T21:48:18Z	2019-10-06T21:56:58Z	MEMBER	I simplified the benchmark: ```python from itertools import product import numpy as np import xarray as xr shape = (10, 10, 10, 10) index = (0, 0, 0, 0) np_arr = np.ones(shape) arr = xr.DataArray(np_arr) named_index = dict(zip(arr.dims, index)) print(index) print(named_index) %timeit -n 1000 arr[index] %timeit -n 1000 arr.isel(named_index) %timeit -n 1000 np_arr[index] (0, 0, 0, 0) {'dim_0': 0, 'dim_1': 0, 'dim_2': 0, 'dim_3': 0} 90.8 µs ± 5.12 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each) 88.5 µs ± 2.74 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each) 115 ns ± 6.71 ns per loop (mean ± std. dev. of 7 runs, 1000 loops each) python %%prun -s cumulative for _ in range(10000): arr[index] 5680003 function calls (5630003 primitive calls) in 1.890 seconds Ordered by: cumulative time ncalls tottime percall cumtime percall filename:lineno(function) 1 0.000 0.000 1.890 1.890 {built-in method builtins.exec} 1 0.009 0.009 1.890 1.890 <string>:1(<module>) 10000 0.011 0.000 1.881 0.000 dataarray.py:629(getitem) 10000 0.030 0.000 1.801 0.000 dataarray.py:988(isel) 10000 0.084 0.000 1.567 0.000 dataset.py:1842(isel) 10000 0.094 0.000 0.570 0.000 dataset.py:1746(_validate_indexers) 10000 0.029 0.000 0.375 0.000 variable.py:960(isel) 10000 0.013 0.000 0.319 0.000 variable.py:666(getitem) 20000 0.014 0.000 0.251 0.000 dataset.py:918(_replace_with_new_dims) 50000 0.028 0.000 0.245 0.000 variable.py:272(init**) 10000 0.035 0.000 0.211 0.000 variable.py:487(_broadcast_indexes) 1140000/1100000 0.100 0.000 0.168 0.000 {built-in method builtins.isinstance} 10000 0.050 0.000 0.157 0.000 dataset.py:1802(_get_indexers_coords_and_indexes) 20000 0.025 0.000 0.153 0.000 dataset.py:868(_replace) 50000 0.085 0.000 0.152 0.000 variable.py:154(as_compatible_data) ``` Time breakdown: Total \| 1.881 -- \| -- DataArray.__getitem__ \| 0.080 DataArray.isel (_to_temp_dataset roundtrip) \| 0.234 Dataset.isel \| 0.622 Dataset._validate_indexers \| 0.570 Variable.isel \| 0.056 Variable.__getitem__ \| 0.319 I can spot a few low-hanging fruits there: - huge amount of time spent on _validate_indexers - Why is `variable__init__` being called 5 times?!? I expected 0. - The bench strongly hints at the fact that we're creating on the fly dummy IndexVariables - We're casting the DataArray to a Dataset, converting the positional index to a dict, then converting it back to positional for each variable. Maybe it's a good idea to rewrite DataArray.sel/isel so that they don't use _to_temp_dataset? So in short while I don't think we can feasibly close the order-of-magnitude gap (800x) with numpy, I suspect we could get at least a 5x speedup here.	{ "total_count": 5, "+1": 5, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	Performance: numpy indexes small amounts of data 1000 faster than xarray 416962458
538791352	https://github.com/pydata/xarray/issues/2799#issuecomment-538791352	https://api.github.com/repos/pydata/xarray/issues/2799	MDEyOklzc3VlQ29tbWVudDUzODc5MTM1Mg==	crusaderky 6213168	2019-10-06T21:47:20Z	2019-10-06T21:48:48Z	MEMBER	After #3375: 1.371 \| TOTAL -- \| -- 0.082 \| DataArray.__getitem__ 0.217 \| DataArray.isel (_to_temp_dataset roundtrip) 0.740 \| Dataset.isel 0.056 \| Variable.isel 0.276 \| Variable.__getitem__ The offending lines in Dataset.isel are these, and I strongly suspect they are improvable: https://github.com/pydata/xarray/blob/4254b4af33843f711459e5242018cd1d678ad3a0/xarray/core/dataset.py#L1922-L1930	{ "total_count": 1, "+1": 1, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	Performance: numpy indexes small amounts of data 1000 faster than xarray 416962458
538790722	https://github.com/pydata/xarray/issues/2799#issuecomment-538790722	https://api.github.com/repos/pydata/xarray/issues/2799	MDEyOklzc3VlQ29tbWVudDUzODc5MDcyMg==	crusaderky 6213168	2019-10-06T21:38:44Z	2019-10-06T21:38:44Z	MEMBER	All those integer indexes were cast into Variables. #3375 stops that.	{ "total_count": 1, "+1": 1, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	Performance: numpy indexes small amounts of data 1000 faster than xarray 416962458
529578839	https://github.com/pydata/xarray/issues/2799#issuecomment-529578839	https://api.github.com/repos/pydata/xarray/issues/2799	MDEyOklzc3VlQ29tbWVudDUyOTU3ODgzOQ==	crusaderky 6213168	2019-09-09T17:15:08Z	2019-09-09T17:15:08Z	MEMBER	Pythran supports Python 2.7 and also has a decent Python 3 support. [...] Pythran now supports Python3 and can be installed as a regular Python3 program. Note however that Python3 support is still in early stage and compilation failure may happen. Report them! This is not a great start :( It's the first time I hear about Pythran. At first sight it looks somewhat like a hybrid between Cython (for the ahead-of-time transpiling to C++) and numba (for having python-compatible syntax). That said, I didn't see anything that hints at potential speedups on the python boilerplate code. I already had experience with compiling pure-python code (tight `__iter__` methods) with Cython, and got around 30% performance boost which - while nothing to scoff at - is not life-changing either. This said, I'd have to spend more time on it to get a more informed opinion.	{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	Performance: numpy indexes small amounts of data 1000 faster than xarray 416962458

Advanced export

JSON shape: default, array, newline-delimited, object

CREATE TABLE [issue_comments] (
   [html_url] TEXT,
   [issue_url] TEXT,
   [id] INTEGER PRIMARY KEY,
   [node_id] TEXT,
   [user] INTEGER REFERENCES [users]([id]),
   [created_at] TEXT,
   [updated_at] TEXT,
   [author_association] TEXT,
   [body] TEXT,
   [reactions] TEXT,
   [performed_via_github_app] TEXT,
   [issue] INTEGER REFERENCES [issues]([id])
);
CREATE INDEX [idx_issue_comments_issue]
    ON [issue_comments] ([issue]);
CREATE INDEX [idx_issue_comments_user]
    ON [issue_comments] ([user]);

issue_comments

6 rows where author_association = "MEMBER", issue = 416962458 and user = 6213168 sorted by updated_at descending

3533 closes the gap between DataArray and numpy from 500x slower to "just" 100x slower :)

!/bin/bash

sudo apt-get install libopenblas-dev gfortran

Advanced export