html_url,issue_url,id,node_id,user,created_at,updated_at,author_association,body,reactions,performed_via_github_app,issue
https://github.com/pydata/xarray/issues/2799#issuecomment-553948714,https://api.github.com/repos/pydata/xarray/issues/2799,553948714,MDEyOklzc3VlQ29tbWVudDU1Mzk0ODcxNA==,6213168,2019-11-14T15:50:35Z,2019-11-14T15:50:35Z,MEMBER,"#3533 closes the gap between DataArray and numpy  from 500x slower to ""just"" 100x slower :)","{""total_count"": 2, ""+1"": 2, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,416962458
https://github.com/pydata/xarray/issues/2799#issuecomment-539218376,https://api.github.com/repos/pydata/xarray/issues/2799,539218376,MDEyOklzc3VlQ29tbWVudDUzOTIxODM3Ng==,6213168,2019-10-07T21:46:32Z,2019-10-07T21:53:33Z,MEMBER,"I tried playing around with pypy 3.6.
Big fat disclaimer: I **did not** run any of the xarray unit tests. Expect trouble if you do.

1.
```bash
#!/bin/bash

set -o errexit
set -o pipefail
set -o nounset
set -o xtrace

tar -xvjf Downloads/pypy3.6-v7.1.1-linux64.tar.bz2
cd pypy3.6-v7.1.1-linux64/bin
./pypy3 -m ensurepip
./pip3.6 install -U pip wheel
./pip list | awk 'NR > 2 {print $1}' | grep -v greenlet | xargs ./pip install -U

# sudo apt-get install libopenblas-dev gfortran
./pip install numpy pandas xarray
```
2. to work around https://bitbucket.org/pypy/pypy/issues/3087/collectionsabc-__init_subclass__-failure, edit ``xarray/core/common.py`` and delete ``AttrAccessMixin.__init_subclass__``

3. timeit is unreliable in pypy. I modified the benchmark as follows:
```python
import time

import numpy as np
import xarray as xr


shape = (10, 10, 10, 10)
index = (0, 0, 0, 0)
np_arr = np.ones(shape)
arr = xr.DataArray(np_arr)


N = 10000

def bench_slice(obj):
    for _ in range(4):
        t0 = time.time()
        for _ in range(N):
            obj[index]
        t1 = time.time()
        t_ns = (t1 - t0) / N * 1e9
        print(f""{t_ns:6.0f} ns {obj.__class__.__name__}"")

bench_slice(arr)
bench_slice(np_arr)
```

Benchmark outputs:
CPython 3.7:
```
 93496 ns DataArray
 92732 ns DataArray
 92560 ns DataArray
 93427 ns DataArray
   119 ns ndarray
   121 ns ndarray
   122 ns ndarray
   119 ns ndarray
```
PyPy 7.1 3.6:
```
113273 ns DataArray
 38543 ns DataArray
 34797 ns DataArray
 39453 ns DataArray
   386 ns ndarray
   289 ns ndarray
   329 ns ndarray
   413 ns ndarray
```
Big important reminder: all results are for a very small array. I would expect the gap between CPython and pypy to get narrower in % (both for numpy and xarray) as the array size gets larger and more time is spent in the pure C numpy code.","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,416962458
https://github.com/pydata/xarray/issues/2799#issuecomment-538570946,https://api.github.com/repos/pydata/xarray/issues/2799,538570946,MDEyOklzc3VlQ29tbWVudDUzODU3MDk0Ng==,6213168,2019-10-04T21:48:18Z,2019-10-06T21:56:58Z,MEMBER,"I simplified the benchmark:
```python
from itertools import product

import numpy as np
import xarray as xr


shape = (10, 10, 10, 10)
index = (0, 0, 0, 0)
np_arr = np.ones(shape)
arr = xr.DataArray(np_arr)
named_index = dict(zip(arr.dims, index))

print(index)
print(named_index)

%timeit -n 1000 arr[index]
%timeit -n 1000 arr.isel(**named_index)
%timeit -n 1000 np_arr[index]
```
```
(0, 0, 0, 0)
{'dim_0': 0, 'dim_1': 0, 'dim_2': 0, 'dim_3': 0}
90.8 µs ± 5.12 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
88.5 µs ± 2.74 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
115 ns ± 6.71 ns per loop (mean ± std. dev. of 7 runs, 1000 loops each)
```
```python
%%prun -s cumulative
for _ in range(10000):
    arr[index]
```
```
   5680003 function calls (5630003 primitive calls) in 1.890 seconds

   Ordered by: cumulative time

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
        1    0.000    0.000    1.890    1.890 {built-in method builtins.exec}
        1    0.009    0.009    1.890    1.890 <string>:1(<module>)
    10000    0.011    0.000    1.881    0.000 dataarray.py:629(__getitem__)
    10000    0.030    0.000    1.801    0.000 dataarray.py:988(isel)
    10000    0.084    0.000    1.567    0.000 dataset.py:1842(isel)
    10000    0.094    0.000    0.570    0.000 dataset.py:1746(_validate_indexers)
    10000    0.029    0.000    0.375    0.000 variable.py:960(isel)
    10000    0.013    0.000    0.319    0.000 variable.py:666(__getitem__)
    20000    0.014    0.000    0.251    0.000 dataset.py:918(_replace_with_new_dims)
    50000    0.028    0.000    0.245    0.000 variable.py:272(__init__)
    10000    0.035    0.000    0.211    0.000 variable.py:487(_broadcast_indexes)
1140000/1100000    0.100    0.000    0.168    0.000 {built-in method builtins.isinstance}
    10000    0.050    0.000    0.157    0.000 dataset.py:1802(_get_indexers_coords_and_indexes)
    20000    0.025    0.000    0.153    0.000 dataset.py:868(_replace)
    50000    0.085    0.000    0.152    0.000 variable.py:154(as_compatible_data)
```

Time breakdown:

Total | 1.881
-- | --
DataArray.\_\_getitem\_\_ | 0.080
DataArray.isel (_to_temp_dataset roundtrip) | 0.234
Dataset.isel | 0.622
Dataset._validate_indexers | 0.570
Variable.isel | 0.056
Variable.\_\_getitem\_\_ | 0.319


I can spot a few low-hanging fruits there:
- huge amount of time spent on _validate_indexers
- Why is ``variable__init__`` being called 5 times?!? I expected 0.
- The bench strongly hints at the fact that we're creating on the fly dummy IndexVariables
- We're casting the DataArray to a Dataset, converting the positional index to a dict, then converting it back to positional for each variable. Maybe it's a good idea to rewrite DataArray.sel/isel so that they don't use _to_temp_dataset?

So in short while I don't think we can feasibly close the order-of-magnitude gap (800x) with numpy, I suspect we could get at least a 5x speedup here.","{""total_count"": 5, ""+1"": 5, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,416962458
https://github.com/pydata/xarray/issues/2799#issuecomment-538791352,https://api.github.com/repos/pydata/xarray/issues/2799,538791352,MDEyOklzc3VlQ29tbWVudDUzODc5MTM1Mg==,6213168,2019-10-06T21:47:20Z,2019-10-06T21:48:48Z,MEMBER,"After #3375:

1.371 | TOTAL
-- | --
0.082 | DataArray.\_\_getitem\_\_
0.217 | DataArray.isel (_to_temp_dataset roundtrip)
0.740 | Dataset.isel
0.056 | Variable.isel
0.276 | Variable.\_\_getitem\_\_

The offending lines in Dataset.isel are these, and I strongly suspect they are improvable:

https://github.com/pydata/xarray/blob/4254b4af33843f711459e5242018cd1d678ad3a0/xarray/core/dataset.py#L1922-L1930","{""total_count"": 1, ""+1"": 1, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,416962458
https://github.com/pydata/xarray/issues/2799#issuecomment-538790722,https://api.github.com/repos/pydata/xarray/issues/2799,538790722,MDEyOklzc3VlQ29tbWVudDUzODc5MDcyMg==,6213168,2019-10-06T21:38:44Z,2019-10-06T21:38:44Z,MEMBER,All those integer indexes were cast into Variables. #3375 stops that.,"{""total_count"": 1, ""+1"": 1, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,416962458
https://github.com/pydata/xarray/issues/2799#issuecomment-529578839,https://api.github.com/repos/pydata/xarray/issues/2799,529578839,MDEyOklzc3VlQ29tbWVudDUyOTU3ODgzOQ==,6213168,2019-09-09T17:15:08Z,2019-09-09T17:15:08Z,MEMBER,"> Pythran supports Python 2.7 and also has a decent Python 3 support.
> [...]
> Pythran now supports Python3 and can be installed as a regular Python3 program. Note however that Python3 support is still in early stage and compilation failure may happen. Report them!

This is _not_ a great start :(


It's the first time I hear about Pythran. At first sight it looks somewhat like a hybrid between Cython (for the ahead-of-time transpiling to C++) and numba (for having python-compatible syntax).

That said, I didn't see anything that hints at potential speedups on the python boilerplate code.

I already had experience with compiling pure-python code (tight ``__iter__`` methods)  with Cython, and got around 30% performance boost which - while nothing to scoff at - is not life-changing either.

This said, I'd have to spend more time on it to get a more informed opinion.
","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,416962458