github: issue_comments: 20 rows where author_association = "MEMBER" and issue = 416962458 sorted by updated

20 rows where author_association = "MEMBER" and issue = 416962458 sorted by updated_at descending

Search:

descending

id	html_url	issue_url	node_id	user	created_at	updated_at ▲	author_association	body	reactions	issue
786816356	https://github.com/pydata/xarray/issues/2799#issuecomment-786816356	https://api.github.com/repos/pydata/xarray/issues/2799	MDEyOklzc3VlQ29tbWVudDc4NjgxNjM1Ng==	jhamman 2443309	2021-02-26T18:25:13Z	2021-02-26T18:25:13Z	MEMBER	I agree, I think a "xarray lite" package with only named dimensions could indeed be a valuable contribution. FWIW, I think the xarray-lite concept would be a great chunk of work to write a small-ish proposal around. I think we could target the next round of CZI EOSS with such a concept.	{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	Performance: numpy indexes small amounts of data 1000 faster than xarray 416962458
786800631	https://github.com/pydata/xarray/issues/2799#issuecomment-786800631	https://api.github.com/repos/pydata/xarray/issues/2799	MDEyOklzc3VlQ29tbWVudDc4NjgwMDYzMQ==	shoyer 1217238	2021-02-26T17:56:07Z	2021-02-26T17:56:07Z	MEMBER	I agree, I think a "xarray lite" package with only named dimensions could indeed be a valuable contribution. I'd love to optimize xarray further, but I suspect you would probably have to write the core in a language like C++ to achieve similar performance to NumPy.	{ "total_count": 1, "+1": 1, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	Performance: numpy indexes small amounts of data 1000 faster than xarray 416962458
553948714	https://github.com/pydata/xarray/issues/2799#issuecomment-553948714	https://api.github.com/repos/pydata/xarray/issues/2799	MDEyOklzc3VlQ29tbWVudDU1Mzk0ODcxNA==	crusaderky 6213168	2019-11-14T15:50:35Z	2019-11-14T15:50:35Z	MEMBER	3533 closes the gap between DataArray and numpy from 500x slower to "just" 100x slower :)	{ "total_count": 2, "+1": 2, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	Performance: numpy indexes small amounts of data 1000 faster than xarray 416962458
553601146	https://github.com/pydata/xarray/issues/2799#issuecomment-553601146	https://api.github.com/repos/pydata/xarray/issues/2799	MDEyOklzc3VlQ29tbWVudDU1MzYwMTE0Ng==	max-sixty 5635139	2019-11-13T21:03:23Z	2019-11-13T21:03:23Z	MEMBER	That's great that's helpful @nbren12 . Maybe we should add to docs (we don't really have a performance section at the moment, maybe we start something on performance tips?) There's some info on the differences in the Terminology that @gwgundersen wrote: https://github.com/pydata/xarray/blob/master/doc/terminology.rst#L18 Essentially: by indexing on the variable, you ignore the coordinates, and so skip a bunch of code that takes the object apart and puts it back together. A variable is much more similar to a numpy array, so you can't do `sel`, for example.	{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	Performance: numpy indexes small amounts of data 1000 faster than xarray 416962458
552714604	https://github.com/pydata/xarray/issues/2799#issuecomment-552714604	https://api.github.com/repos/pydata/xarray/issues/2799	MDEyOklzc3VlQ29tbWVudDU1MjcxNDYwNA==	max-sixty 5635139	2019-11-12T03:10:39Z	2019-11-12T03:10:39Z	MEMBER	One note: if you're indexing into a dataarray and don't care about the coords, index into the variable. 2x numpy time, rather than 30x: ```python In [26]: da = xr.tutorial.open_dataset('air_temperature')['air'] In [27]: da Out[27]: <xarray.DataArray 'air' (time: 2920, lat: 25, lon: 53)> [3869000 values with dtype=float32] Coordinates: * lat (lat) float32 75.0 72.5 70.0 67.5 65.0 ... 25.0 22.5 20.0 17.5 15.0 * lon (lon) float32 200.0 202.5 205.0 207.5 ... 322.5 325.0 327.5 330.0 * time (time) datetime64[ns] 2013-01-01 ... 2014-12-31T18:00:00 Attributes: long_name: 4xDaily Air temperature at sigma level 995 units: degK precision: 2 GRIB_id: 11 GRIB_name: TMP var_desc: Air temperature dataset: NMC Reanalysis level_desc: Surface statistic: Individual Obs parent_stat: Other actual_range: [185.16 322.1 ] In [20]: %timeit da.variable[0] 28.2 µs ± 2.29 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each) In [21]: %timeit da[0] 459 µs ± 37.9 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each) In [22]: %timeit da.variable.values[0] 14.1 µs ± 183 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each) ```	{ "total_count": 2, "+1": 2, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	Performance: numpy indexes small amounts of data 1000 faster than xarray 416962458
552655149	https://github.com/pydata/xarray/issues/2799#issuecomment-552655149	https://api.github.com/repos/pydata/xarray/issues/2799	MDEyOklzc3VlQ29tbWVudDU1MjY1NTE0OQ==	shoyer 1217238	2019-11-11T22:57:55Z	2019-11-11T22:57:55Z	MEMBER	Sure, I just wanted to make the note that this operation should be more or less constant time, as opposed to dependent on the size of the array. Yes, I think this is still the case for slicing in xarray. There's just much larger constant overhead than in NumPy. (And this is difficult to fix short of rewriting xarray's core in C.)	{ "total_count": 1, "+1": 1, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	Performance: numpy indexes small amounts of data 1000 faster than xarray 416962458
552646381	https://github.com/pydata/xarray/issues/2799#issuecomment-552646381	https://api.github.com/repos/pydata/xarray/issues/2799	MDEyOklzc3VlQ29tbWVudDU1MjY0NjM4MQ==	max-sixty 5635139	2019-11-11T22:29:58Z	2019-11-11T22:29:58Z	MEMBER	TBC I think there's plenty we could do with relatively little complexity to speed up indexing operations on `DataArray`s. As an example, we could avoid the roundtrip to a temporary `Dataset`. That's a different problem from making xarray as fast as indexing a numpy array, or allowing libraries to iterate through a `DataArray` in a hot loop.	{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	Performance: numpy indexes small amounts of data 1000 faster than xarray 416962458
539218376	https://github.com/pydata/xarray/issues/2799#issuecomment-539218376	https://api.github.com/repos/pydata/xarray/issues/2799	MDEyOklzc3VlQ29tbWVudDUzOTIxODM3Ng==	crusaderky 6213168	2019-10-07T21:46:32Z	2019-10-07T21:53:33Z	MEMBER	I tried playing around with pypy 3.6. Big fat disclaimer: I did not run any of the xarray unit tests. Expect trouble if you do. 1. ```bash !/bin/bash set -o errexit set -o pipefail set -o nounset set -o xtrace tar -xvjf Downloads/pypy3.6-v7.1.1-linux64.tar.bz2 cd pypy3.6-v7.1.1-linux64/bin ./pypy3 -m ensurepip ./pip3.6 install -U pip wheel ./pip list \| awk 'NR > 2 {print $1}' \| grep -v greenlet \| xargs ./pip install -U sudo apt-get install libopenblas-dev gfortran ./pip install numpy pandas xarray ` 2. to work around https://bitbucket.org/pypy/pypy/issues/3087/collectionsabc-__init_subclass__-failure, editxarray/core/common.py`and delete`AttrAccessMixin.init_subclass`` timeit is unreliable in pypy. I modified the benchmark as follows: ```python import time import numpy as np import xarray as xr shape = (10, 10, 10, 10) index = (0, 0, 0, 0) np_arr = np.ones(shape) arr = xr.DataArray(np_arr) N = 10000 def bench_slice(obj): for _ in range(4): t0 = time.time() for _ in range(N): obj[index] t1 = time.time() t_ns = (t1 - t0) / N * 1e9 print(f"{t_ns:6.0f} ns {obj.class.name}") bench_slice(arr) bench_slice(np_arr) ``` Benchmark outputs: CPython 3.7: `93496 ns DataArray 92732 ns DataArray 92560 ns DataArray 93427 ns DataArray 119 ns ndarray 121 ns ndarray 122 ns ndarray 119 ns ndarray` PyPy 7.1 3.6: `113273 ns DataArray 38543 ns DataArray 34797 ns DataArray 39453 ns DataArray 386 ns ndarray 289 ns ndarray 329 ns ndarray 413 ns ndarray` Big important reminder: all results are for a very small array. I would expect the gap between CPython and pypy to get narrower in % (both for numpy and xarray) as the array size gets larger and more time is spent in the pure C numpy code.	{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	Performance: numpy indexes small amounts of data 1000 faster than xarray 416962458
539100243	https://github.com/pydata/xarray/issues/2799#issuecomment-539100243	https://api.github.com/repos/pydata/xarray/issues/2799	MDEyOklzc3VlQ29tbWVudDUzOTEwMDI0Mw==	max-sixty 5635139	2019-10-07T16:39:54Z	2019-10-07T16:39:54Z	MEMBER	Great analysis, thanks Do we have any idea of which of those lines are offending? I used a tool `line_profiler` a while ago, but maybe we know already (I'm guessing it's the two `_replace_with_new_dims` lines?)	{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	Performance: numpy indexes small amounts of data 1000 faster than xarray 416962458
538570946	https://github.com/pydata/xarray/issues/2799#issuecomment-538570946	https://api.github.com/repos/pydata/xarray/issues/2799	MDEyOklzc3VlQ29tbWVudDUzODU3MDk0Ng==	crusaderky 6213168	2019-10-04T21:48:18Z	2019-10-06T21:56:58Z	MEMBER	I simplified the benchmark: ```python from itertools import product import numpy as np import xarray as xr shape = (10, 10, 10, 10) index = (0, 0, 0, 0) np_arr = np.ones(shape) arr = xr.DataArray(np_arr) named_index = dict(zip(arr.dims, index)) print(index) print(named_index) %timeit -n 1000 arr[index] %timeit -n 1000 arr.isel(named_index) %timeit -n 1000 np_arr[index] (0, 0, 0, 0) {'dim_0': 0, 'dim_1': 0, 'dim_2': 0, 'dim_3': 0} 90.8 µs ± 5.12 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each) 88.5 µs ± 2.74 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each) 115 ns ± 6.71 ns per loop (mean ± std. dev. of 7 runs, 1000 loops each) python %%prun -s cumulative for _ in range(10000): arr[index] 5680003 function calls (5630003 primitive calls) in 1.890 seconds Ordered by: cumulative time ncalls tottime percall cumtime percall filename:lineno(function) 1 0.000 0.000 1.890 1.890 {built-in method builtins.exec} 1 0.009 0.009 1.890 1.890 <string>:1(<module>) 10000 0.011 0.000 1.881 0.000 dataarray.py:629(getitem) 10000 0.030 0.000 1.801 0.000 dataarray.py:988(isel) 10000 0.084 0.000 1.567 0.000 dataset.py:1842(isel) 10000 0.094 0.000 0.570 0.000 dataset.py:1746(_validate_indexers) 10000 0.029 0.000 0.375 0.000 variable.py:960(isel) 10000 0.013 0.000 0.319 0.000 variable.py:666(getitem) 20000 0.014 0.000 0.251 0.000 dataset.py:918(_replace_with_new_dims) 50000 0.028 0.000 0.245 0.000 variable.py:272(init**) 10000 0.035 0.000 0.211 0.000 variable.py:487(_broadcast_indexes) 1140000/1100000 0.100 0.000 0.168 0.000 {built-in method builtins.isinstance} 10000 0.050 0.000 0.157 0.000 dataset.py:1802(_get_indexers_coords_and_indexes) 20000 0.025 0.000 0.153 0.000 dataset.py:868(_replace) 50000 0.085 0.000 0.152 0.000 variable.py:154(as_compatible_data) ``` Time breakdown: Total \| 1.881 -- \| -- DataArray.__getitem__ \| 0.080 DataArray.isel (_to_temp_dataset roundtrip) \| 0.234 Dataset.isel \| 0.622 Dataset._validate_indexers \| 0.570 Variable.isel \| 0.056 Variable.__getitem__ \| 0.319 I can spot a few low-hanging fruits there: - huge amount of time spent on _validate_indexers - Why is `variable__init__` being called 5 times?!? I expected 0. - The bench strongly hints at the fact that we're creating on the fly dummy IndexVariables - We're casting the DataArray to a Dataset, converting the positional index to a dict, then converting it back to positional for each variable. Maybe it's a good idea to rewrite DataArray.sel/isel so that they don't use _to_temp_dataset? So in short while I don't think we can feasibly close the order-of-magnitude gap (800x) with numpy, I suspect we could get at least a 5x speedup here.	{ "total_count": 5, "+1": 5, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	Performance: numpy indexes small amounts of data 1000 faster than xarray 416962458
538791352	https://github.com/pydata/xarray/issues/2799#issuecomment-538791352	https://api.github.com/repos/pydata/xarray/issues/2799	MDEyOklzc3VlQ29tbWVudDUzODc5MTM1Mg==	crusaderky 6213168	2019-10-06T21:47:20Z	2019-10-06T21:48:48Z	MEMBER	After #3375: 1.371 \| TOTAL -- \| -- 0.082 \| DataArray.__getitem__ 0.217 \| DataArray.isel (_to_temp_dataset roundtrip) 0.740 \| Dataset.isel 0.056 \| Variable.isel 0.276 \| Variable.__getitem__ The offending lines in Dataset.isel are these, and I strongly suspect they are improvable: https://github.com/pydata/xarray/blob/4254b4af33843f711459e5242018cd1d678ad3a0/xarray/core/dataset.py#L1922-L1930	{ "total_count": 1, "+1": 1, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	Performance: numpy indexes small amounts of data 1000 faster than xarray 416962458
538790722	https://github.com/pydata/xarray/issues/2799#issuecomment-538790722	https://api.github.com/repos/pydata/xarray/issues/2799	MDEyOklzc3VlQ29tbWVudDUzODc5MDcyMg==	crusaderky 6213168	2019-10-06T21:38:44Z	2019-10-06T21:38:44Z	MEMBER	All those integer indexes were cast into Variables. #3375 stops that.	{ "total_count": 1, "+1": 1, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	Performance: numpy indexes small amounts of data 1000 faster than xarray 416962458
529578839	https://github.com/pydata/xarray/issues/2799#issuecomment-529578839	https://api.github.com/repos/pydata/xarray/issues/2799	MDEyOklzc3VlQ29tbWVudDUyOTU3ODgzOQ==	crusaderky 6213168	2019-09-09T17:15:08Z	2019-09-09T17:15:08Z	MEMBER	Pythran supports Python 2.7 and also has a decent Python 3 support. [...] Pythran now supports Python3 and can be installed as a regular Python3 program. Note however that Python3 support is still in early stage and compilation failure may happen. Report them! This is not a great start :( It's the first time I hear about Pythran. At first sight it looks somewhat like a hybrid between Cython (for the ahead-of-time transpiling to C++) and numba (for having python-compatible syntax). That said, I didn't see anything that hints at potential speedups on the python boilerplate code. I already had experience with compiling pure-python code (tight `__iter__` methods) with Cython, and got around 30% performance boost which - while nothing to scoff at - is not life-changing either. This said, I'd have to spend more time on it to get a more informed opinion.	{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	Performance: numpy indexes small amounts of data 1000 faster than xarray 416962458
469898607	https://github.com/pydata/xarray/issues/2799#issuecomment-469898607	https://api.github.com/repos/pydata/xarray/issues/2799	MDEyOklzc3VlQ29tbWVudDQ2OTg5ODYwNw==	max-sixty 5635139	2019-03-05T23:16:43Z	2019-03-05T23:16:43Z	MEMBER	Cython + memoryviews isn't quite the right comparison here. Right, tbc, I'm only referring to the top two lines of the pasted benchmark; i.e. once we enter python (even if only to access a numpy array) we're already losing a lot of the speed relative to the loop staying in C / Cython. So even if xarray were a python front-end to a C++ library, it still wouldn't be competitive if performance were paramount. ...unless pypy sped that up; I'd be v interested to see.	{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	Performance: numpy indexes small amounts of data 1000 faster than xarray 416962458
469869298	https://github.com/pydata/xarray/issues/2799#issuecomment-469869298	https://api.github.com/repos/pydata/xarray/issues/2799	MDEyOklzc3VlQ29tbWVudDQ2OTg2OTI5OA==	shoyer 1217238	2019-03-05T21:43:18Z	2019-03-05T21:43:32Z	MEMBER	Cython + memoryviews isn't quite the right comparison here. I'm sure ordering here is correct, but relative magnitude of the performance difference should be smaller. Xarray's core is bottlenecked on: 1. Overhead of abstraction with normal Python operations (e.g., function calls) in non-numeric code (all the heavy numerics is offloaded to NumPy or pandas). 2. The dynamic nature of our APIs, which means we need to do lots of type checking. Notice how high up `builtins.isinstance` appears in that performance profile! C++ offers very low-cost abstraction but dynamism is still slow. Even then, compilers are much better at speeding up tight numeric loops than complex domain logic. As a point of reference, it would be interesting to see these performance numbers running pypy, which I think should be able to handle everything in xarray. You'll note that pypy is something like 7x faster than CPython in their benchmark suite, which I suspect is closer to what we'd see if we wrote xarray's core in a language like C++, e.g., as Python interface to xframe.	{ "total_count": 1, "+1": 1, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	Performance: numpy indexes small amounts of data 1000 faster than xarray 416962458
469861382	https://github.com/pydata/xarray/issues/2799#issuecomment-469861382	https://api.github.com/repos/pydata/xarray/issues/2799	MDEyOklzc3VlQ29tbWVudDQ2OTg2MTM4Mg==	max-sixty 5635139	2019-03-05T21:19:31Z	2019-03-05T21:19:31Z	MEMBER	To put the relative speed of numpy access into perspective, I found this insightful: https://jakevdp.github.io/blog/2012/08/08/memoryview-benchmarks/ (it's now a few years out of date, but I think the fundamentals still stand) Pasted from there: Summary Here are the timing results we've seen above: Python + numpy: 6510 ms Cython + numpy: 668 ms Cython + memviews (slicing): 22 ms Cython + raw pointers: 2.47 ms Cython + memviews (no slicing): 2.45 ms So if we're running an inner loop on an array, accessing it using numpy in python is an order of magnitude slower than accessing it using numpy in C (and that's an order of magnitude slower than using a slice, and that's an order of magnitude slower than using raw pointers) So - let's definitely speed xarray up (your benchmarks are excellent, thank you again, and I think you're right there are opportunities for significant increases). But where speed is paramount above all else, we shouldn't use any access in python, let alone the niceties of xarray access.	{ "total_count": 1, "+1": 1, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	Performance: numpy indexes small amounts of data 1000 faster than xarray 416962458
469449165	https://github.com/pydata/xarray/issues/2799#issuecomment-469449165	https://api.github.com/repos/pydata/xarray/issues/2799	MDEyOklzc3VlQ29tbWVudDQ2OTQ0OTE2NQ==	max-sixty 5635139	2019-03-04T22:33:03Z	2019-03-04T22:33:03Z	MEMBER	You can always use xarray to process the data, and then extract the underlying array (`da.values`) for passing into something expecting an numpy array / for running fast(ish) loops (we do this frequently).	{ "total_count": 1, "+1": 1, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	Performance: numpy indexes small amounts of data 1000 faster than xarray 416962458
469445483	https://github.com/pydata/xarray/issues/2799#issuecomment-469445483	https://api.github.com/repos/pydata/xarray/issues/2799	MDEyOklzc3VlQ29tbWVudDQ2OTQ0NTQ4Mw==	max-sixty 5635139	2019-03-04T22:20:58Z	2019-03-04T22:20:58Z	MEMBER	Thanks for the benchmarks @nbren12, and for the clear explanation @shoyer While we could do some performance work on that loop, I think we're likely to see a material change by enabling the external library to access directly from the array, without a looped python call. That's consistent with the ideas @jhamman had a few days ago.	{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	Performance: numpy indexes small amounts of data 1000 faster than xarray 416962458
469444519	https://github.com/pydata/xarray/issues/2799#issuecomment-469444519	https://api.github.com/repos/pydata/xarray/issues/2799	MDEyOklzc3VlQ29tbWVudDQ2OTQ0NDUxOQ==	shoyer 1217238	2019-03-04T22:17:58Z	2019-03-04T22:17:58Z	MEMBER	To be clear, pull requests improving performance (without significantly loss of readability) would be very welcome. Be sure to include a new benchmark in our benchmark suite.	{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	Performance: numpy indexes small amounts of data 1000 faster than xarray 416962458
469439957	https://github.com/pydata/xarray/issues/2799#issuecomment-469439957	https://api.github.com/repos/pydata/xarray/issues/2799	MDEyOklzc3VlQ29tbWVudDQ2OTQzOTk1Nw==	shoyer 1217238	2019-03-04T22:03:37Z	2019-03-04T22:16:49Z	MEMBER	While python will always be slower than C when iterating over an array in this fashion, I would hope that xarray could be nearly as fast as numpy. I am not sure what the best way to improve this is though. I'm sure it's possible to optimize this significantly, but short of rewriting this logic in a lower level language it's pretty much impossible to match the speed of NumPy. This benchmark might give some useful context: ``` def dummy_isel(args, kwargs): pass def index_dummy(named_indices, arr): for named_index in named_indices: dummy_isel(arr, named_index) %%timeit -n 10 index_dummy(named_indices, arr) ``` On my machine, this is already twice as slow as your NumPy benchmark (497 µs vs 251 µs) , and all it's doing is parsing `args` and `**kwargs`! Every Python function/method call involving keyword arguments adds about 0.5 ns of overhead, because the highly optimized `dict` is (relatively) slow compared to positional arguments. In my experience it is almost impossible to get the overhead of a Python function call below a few microseconds. Right now we're at about 130 µs per indexing operation. In the best case, we might make this 10x faster but even that would be quite challenging, e.g., consider that even creating a DataArray takes about 20 µs.	{ "total_count": 1, "+1": 1, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	Performance: numpy indexes small amounts of data 1000 faster than xarray 416962458

Advanced export

JSON shape: default, array, newline-delimited, object

CREATE TABLE [issue_comments] (
   [html_url] TEXT,
   [issue_url] TEXT,
   [id] INTEGER PRIMARY KEY,
   [node_id] TEXT,
   [user] INTEGER REFERENCES [users]([id]),
   [created_at] TEXT,
   [updated_at] TEXT,
   [author_association] TEXT,
   [body] TEXT,
   [reactions] TEXT,
   [performed_via_github_app] TEXT,
   [issue] INTEGER REFERENCES [issues]([id])
);
CREATE INDEX [idx_issue_comments_issue]
    ON [issue_comments] ([issue]);
CREATE INDEX [idx_issue_comments_user]
    ON [issue_comments] ([user]);

issue_comments

20 rows where author_association = "MEMBER" and issue = 416962458 sorted by updated_at descending

3533 closes the gap between DataArray and numpy from 500x slower to "just" 100x slower :)

!/bin/bash

sudo apt-get install libopenblas-dev gfortran

Advanced export