home / github

Menu
  • Search all tables
  • GraphQL API

issue_comments

Table actions
  • GraphQL API for issue_comments

40 rows where issue = 416962458 sorted by updated_at descending

✖
✖

✎ View and edit SQL

This data as json, CSV (advanced)

Suggested facets: reactions, created_at (date), updated_at (date)

user 10

  • max-sixty 8
  • nbren12 6
  • crusaderky 6
  • shoyer 5
  • eserie 5
  • hmaarrfk 4
  • ashwinvis 2
  • openSourcerer9000 2
  • jhamman 1
  • DerWeh 1

author_association 3

  • MEMBER 20
  • CONTRIBUTOR 12
  • NONE 8

issue 1

  • Performance: numpy indexes small amounts of data 1000 faster than xarray · 40 ✖
id html_url issue_url node_id user created_at updated_at ▲ author_association body reactions performed_via_github_app issue
1306386310 https://github.com/pydata/xarray/issues/2799#issuecomment-1306386310 https://api.github.com/repos/pydata/xarray/issues/2799 IC_kwDOAMm_X85N3d-G openSourcerer9000 61931826 2022-11-07T23:53:17Z 2022-11-07T23:53:17Z NONE

So a workaround I was able to use was to load the whole thing into a np array (18GB!) in 1 minute da.values, index 15 nodes in 0.4 seconds (was taking ~5min in xarray), then load it back into a dataarray. Not accommodating for our friends of differently-abled memory cards, but it worked in my case

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Performance: numpy indexes small amounts of data 1000 faster than xarray 416962458
1306327743 https://github.com/pydata/xarray/issues/2799#issuecomment-1306327743 https://api.github.com/repos/pydata/xarray/issues/2799 IC_kwDOAMm_X85N3Pq_ hmaarrfk 90008 2022-11-07T22:45:07Z 2022-11-07T22:45:07Z CONTRIBUTOR

As I've been recently going down this performance rabbit hole, I think the discussion around https://github.com/pydata/xarray/issues/7045 is relevant and provides some additional historical context as to "why" this performance penalty might be happening.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Performance: numpy indexes small amounts of data 1000 faster than xarray 416962458
1306300937 https://github.com/pydata/xarray/issues/2799#issuecomment-1306300937 https://api.github.com/repos/pydata/xarray/issues/2799 IC_kwDOAMm_X85N3JIJ openSourcerer9000 61931826 2022-11-07T22:16:55Z 2022-11-07T22:16:55Z NONE

I'm really not understanding why indexing is so slow. My dataarray has 2 dims, one axis 1.5 million long ('node') and the other 1500 ('time'). Trying to pull a single timeseries by indexing 1 node takes 16 seconds. the Variable workaround or playing around with chunking doesn't change anything. The only thing loading into memory should be array of 1500 values.

Not sure what's going on under the hood but there may be a way to specify that you're only looking to optimize indexing along 1 dim. Once it gets indexed it becomes a very tiny data set. I would think chunks={'node':1} would do exactly this but I guess not.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Performance: numpy indexes small amounts of data 1000 faster than xarray 416962458
863062212 https://github.com/pydata/xarray/issues/2799#issuecomment-863062212 https://api.github.com/repos/pydata/xarray/issues/2799 MDEyOklzc3VlQ29tbWVudDg2MzA2MjIxMg== eserie 17484729 2021-06-17T08:57:28Z 2021-06-17T08:57:28Z NONE

Hello,

I don't want to disrupt the issue too much (so let me know if you'd rather we continue the discussion outside).

Somewhat related to the discussions in this issue, I recently released an open-source library: WAX-ML, https://github.com/eserie/wax-ml, where I implement an accessor to unroll JAX transformations on DataSet and Dataarray xarray containers along a time dimension.

I hope this can help!

{
    "total_count": 1,
    "+1": 1,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Performance: numpy indexes small amounts of data 1000 faster than xarray 416962458
797116001 https://github.com/pydata/xarray/issues/2799#issuecomment-797116001 https://api.github.com/repos/pydata/xarray/issues/2799 MDEyOklzc3VlQ29tbWVudDc5NzExNjAwMQ== eserie 17484729 2021-03-11T23:20:50Z 2021-03-11T23:20:50Z NONE

FWIW, I think the xarray-lite concept would be a great chunk of work to write a small-ish proposal around. I think we could target the next round of CZI EOSS with such a concept.

@jhamman I'll be happy to participate in the discussion.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Performance: numpy indexes small amounts of data 1000 faster than xarray 416962458
790546398 https://github.com/pydata/xarray/issues/2799#issuecomment-790546398 https://api.github.com/repos/pydata/xarray/issues/2799 MDEyOklzc3VlQ29tbWVudDc5MDU0NjM5OA== eserie 17484729 2021-03-04T11:29:37Z 2021-03-04T11:29:37Z NONE

In case it could be usefull, and be reused for benchmarks, I released on my github two notebooks with an implementation of a faster (but somehow very simplified and not very optimized in term of code architecture) version of DataArray and Dataset containers. The second notebook contains some line profilings for buffer experiments with various containers. This permits to point on operations which are slow in Datarray implementation for this use case.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Performance: numpy indexes small amounts of data 1000 faster than xarray 416962458
786837666 https://github.com/pydata/xarray/issues/2799#issuecomment-786837666 https://api.github.com/repos/pydata/xarray/issues/2799 MDEyOklzc3VlQ29tbWVudDc4NjgzNzY2Ng== eserie 17484729 2021-02-26T19:06:08Z 2021-02-26T19:07:20Z NONE

Thanks all for your prompt responses!

@hmaarrfk , I share your recommendation and it's a great thing to be able to fallback to numpy arrays when the algorithmic part is well decoupled from the data preparation process. It's what I also do when I can. However, in workflows working on streaming data the two things (data preparation and computation) may be intricated or frequently alternated. My example of "buffer data array" structure is something quite natural to consider in such a context and having an efficient implementation of labelled ndarray could really serve the task.

@shoyer I think a first "lite" implementation fully implemented in python could be already a great thing. It would not achieve numpy performance, but the additional cost du to management of coordinates alignement should not be too expensive.

An additional suggestion: if the target is computational workflows, trying to have some compatibility with packages such as eagerpy would enabling working with other tensor frameworks commonly used in machine learning. This kind of feature could be adressed yet in another package, but having it in mind may influence the early choices in term of implementation (ex: pure python vs C++).

@jhamman, @shoyer I would be pleased to share my work on buffer data array if you think it could serve as kind of use-case. In this context, I experimented a bit with a « crafted » lite version of xarray and I could achieve a x10 factor in performance improvement.

{
    "total_count": 1,
    "+1": 1,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Performance: numpy indexes small amounts of data 1000 faster than xarray 416962458
786816356 https://github.com/pydata/xarray/issues/2799#issuecomment-786816356 https://api.github.com/repos/pydata/xarray/issues/2799 MDEyOklzc3VlQ29tbWVudDc4NjgxNjM1Ng== jhamman 2443309 2021-02-26T18:25:13Z 2021-02-26T18:25:13Z MEMBER

I agree, I think a "xarray lite" package with only named dimensions could indeed be a valuable contribution.

FWIW, I think the xarray-lite concept would be a great chunk of work to write a small-ish proposal around. I think we could target the next round of CZI EOSS with such a concept.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Performance: numpy indexes small amounts of data 1000 faster than xarray 416962458
786813358 https://github.com/pydata/xarray/issues/2799#issuecomment-786813358 https://api.github.com/repos/pydata/xarray/issues/2799 MDEyOklzc3VlQ29tbWVudDc4NjgxMzM1OA== hmaarrfk 90008 2021-02-26T18:19:28Z 2021-02-26T18:19:28Z CONTRIBUTOR

I hope the following can help users that struggle with the speed of xarray:

I've found that when doing numerical computation, I often use the xarray to grab all the metadata relevant to my computation. Scale, chromaticity, experimental information.

Eventually, i create a function that acts as a barrier: - Xarray input (high level experimental data) - Computation parameters output (low level implementation detail relevant information).

The low level implementation can operate on the fast numpy arrays. I've found this to be the struggle with creating high level APIs that do things like sanitize inputs (xarray routines like _validate_indexers and _broadcast_indexes) and low level APIs that are simply interested in moving and computing data.

For the example that @nbren12 brought up originally, it might be better to create xarray routines (if they don't exist already) that can create fast iterators for the underlying numpy arrays given a set of dimensions that the user cares about.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Performance: numpy indexes small amounts of data 1000 faster than xarray 416962458
786800631 https://github.com/pydata/xarray/issues/2799#issuecomment-786800631 https://api.github.com/repos/pydata/xarray/issues/2799 MDEyOklzc3VlQ29tbWVudDc4NjgwMDYzMQ== shoyer 1217238 2021-02-26T17:56:07Z 2021-02-26T17:56:07Z MEMBER

I agree, I think a "xarray lite" package with only named dimensions could indeed be a valuable contribution.

I'd love to optimize xarray further, but I suspect you would probably have to write the core in a language like C++ to achieve similar performance to NumPy.

{
    "total_count": 1,
    "+1": 1,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Performance: numpy indexes small amounts of data 1000 faster than xarray 416962458
786764651 https://github.com/pydata/xarray/issues/2799#issuecomment-786764651 https://api.github.com/repos/pydata/xarray/issues/2799 MDEyOklzc3VlQ29tbWVudDc4Njc2NDY1MQ== nbren12 1386642 2021-02-26T16:51:50Z 2021-02-26T16:51:50Z CONTRIBUTOR

@jhamman Weren't you talking about an xarray lite (TM) package?

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Performance: numpy indexes small amounts of data 1000 faster than xarray 416962458
786759897 https://github.com/pydata/xarray/issues/2799#issuecomment-786759897 https://api.github.com/repos/pydata/xarray/issues/2799 MDEyOklzc3VlQ29tbWVudDc4Njc1OTg5Nw== eserie 17484729 2021-02-26T16:43:23Z 2021-02-26T16:43:23Z NONE

Hi,

I'm working on a machine learning application where I want to stream data and use xarray containers to store them in a buffer (with an additional "lag" dimension) and guaranty good alignement of the coordinates on various dimensions of the streamed data. Doing so, I noticed that the version of my code working with xarray is very slow when compared to a pure numpy implementation (with no coordinate alignement) or even an implementation with deque+pandas. I think the performance issue that I noticed is basically the same observation than the ones of this issue.

I have the impression that for this kind of applications or more generally for intensive algorithmic usages, also as stated at the begining of this issue, a light (with less functionalities and checks) and fast version of xarray DataArray and Dataset containers could be developped.

Do you think this could be something doable in the scope of xarray? Would it be preferable to create a dedicated library?

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Performance: numpy indexes small amounts of data 1000 faster than xarray 416962458
553948714 https://github.com/pydata/xarray/issues/2799#issuecomment-553948714 https://api.github.com/repos/pydata/xarray/issues/2799 MDEyOklzc3VlQ29tbWVudDU1Mzk0ODcxNA== crusaderky 6213168 2019-11-14T15:50:35Z 2019-11-14T15:50:35Z MEMBER

3533 closes the gap between DataArray and numpy from 500x slower to "just" 100x slower :)

{
    "total_count": 2,
    "+1": 2,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Performance: numpy indexes small amounts of data 1000 faster than xarray 416962458
553601146 https://github.com/pydata/xarray/issues/2799#issuecomment-553601146 https://api.github.com/repos/pydata/xarray/issues/2799 MDEyOklzc3VlQ29tbWVudDU1MzYwMTE0Ng== max-sixty 5635139 2019-11-13T21:03:23Z 2019-11-13T21:03:23Z MEMBER

That's great that's helpful @nbren12 . Maybe we should add to docs (we don't really have a performance section at the moment, maybe we start something on performance tips?)

There's some info on the differences in the Terminology that @gwgundersen wrote: https://github.com/pydata/xarray/blob/master/doc/terminology.rst#L18

Essentially: by indexing on the variable, you ignore the coordinates, and so skip a bunch of code that takes the object apart and puts it back together. A variable is much more similar to a numpy array, so you can't do sel, for example.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Performance: numpy indexes small amounts of data 1000 faster than xarray 416962458
553294966 https://github.com/pydata/xarray/issues/2799#issuecomment-553294966 https://api.github.com/repos/pydata/xarray/issues/2799 MDEyOklzc3VlQ29tbWVudDU1MzI5NDk2Ng== nbren12 1386642 2019-11-13T08:32:05Z 2019-11-13T08:32:16Z CONTRIBUTOR

This variable workaround is awesome @max-sixty. Are there any guidelines on when to use Variable vs DataArray? Some calculations (e.g. fast difference and derivatives/stencil operations) seem cleaner without explicit coordinate labels.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Performance: numpy indexes small amounts of data 1000 faster than xarray 416962458
552714604 https://github.com/pydata/xarray/issues/2799#issuecomment-552714604 https://api.github.com/repos/pydata/xarray/issues/2799 MDEyOklzc3VlQ29tbWVudDU1MjcxNDYwNA== max-sixty 5635139 2019-11-12T03:10:39Z 2019-11-12T03:10:39Z MEMBER

One note: if you're indexing into a dataarray and don't care about the coords, index into the variable. 2x numpy time, rather than 30x:

```python In [26]: da = xr.tutorial.open_dataset('air_temperature')['air']

In [27]: da Out[27]: <xarray.DataArray 'air' (time: 2920, lat: 25, lon: 53)> [3869000 values with dtype=float32] Coordinates: * lat (lat) float32 75.0 72.5 70.0 67.5 65.0 ... 25.0 22.5 20.0 17.5 15.0 * lon (lon) float32 200.0 202.5 205.0 207.5 ... 322.5 325.0 327.5 330.0 * time (time) datetime64[ns] 2013-01-01 ... 2014-12-31T18:00:00 Attributes: long_name: 4xDaily Air temperature at sigma level 995 units: degK precision: 2 GRIB_id: 11 GRIB_name: TMP var_desc: Air temperature dataset: NMC Reanalysis level_desc: Surface statistic: Individual Obs parent_stat: Other actual_range: [185.16 322.1 ]

In [20]: %timeit da.variable[0] 28.2 µs ± 2.29 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each)

In [21]: %timeit da[0] 459 µs ± 37.9 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)

In [22]: %timeit da.variable.values[0] 14.1 µs ± 183 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)

```

{
    "total_count": 2,
    "+1": 2,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Performance: numpy indexes small amounts of data 1000 faster than xarray 416962458
552655149 https://github.com/pydata/xarray/issues/2799#issuecomment-552655149 https://api.github.com/repos/pydata/xarray/issues/2799 MDEyOklzc3VlQ29tbWVudDU1MjY1NTE0OQ== shoyer 1217238 2019-11-11T22:57:55Z 2019-11-11T22:57:55Z MEMBER

Sure, I just wanted to make the note that this operation should be more or less constant time, as opposed to dependent on the size of the array.

Yes, I think this is still the case for slicing in xarray. There's just much larger constant overhead than in NumPy. (And this is difficult to fix short of rewriting xarray's core in C.)

{
    "total_count": 1,
    "+1": 1,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Performance: numpy indexes small amounts of data 1000 faster than xarray 416962458
552652019 https://github.com/pydata/xarray/issues/2799#issuecomment-552652019 https://api.github.com/repos/pydata/xarray/issues/2799 MDEyOklzc3VlQ29tbWVudDU1MjY1MjAxOQ== hmaarrfk 90008 2019-11-11T22:47:47Z 2019-11-11T22:47:47Z CONTRIBUTOR

Sure, I just wanted to make the note that this operation should be more or less constant time, as opposed to dependent on the size of the array. Somebody had mentionned it should increase with the size of the array.

{
    "total_count": 1,
    "+1": 1,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Performance: numpy indexes small amounts of data 1000 faster than xarray 416962458
552646381 https://github.com/pydata/xarray/issues/2799#issuecomment-552646381 https://api.github.com/repos/pydata/xarray/issues/2799 MDEyOklzc3VlQ29tbWVudDU1MjY0NjM4MQ== max-sixty 5635139 2019-11-11T22:29:58Z 2019-11-11T22:29:58Z MEMBER

TBC I think there's plenty we could do with relatively little complexity to speed up indexing operations on DataArrays. As an example, we could avoid the roundtrip to a temporary Dataset.

That's a different problem from making xarray as fast as indexing a numpy array, or allowing libraries to iterate through a DataArray in a hot loop.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Performance: numpy indexes small amounts of data 1000 faster than xarray 416962458
552619589 https://github.com/pydata/xarray/issues/2799#issuecomment-552619589 https://api.github.com/repos/pydata/xarray/issues/2799 MDEyOklzc3VlQ29tbWVudDU1MjYxOTU4OQ== hmaarrfk 90008 2019-11-11T21:16:36Z 2019-11-11T21:16:36Z CONTRIBUTOR

Hmm, slicing should basically be a no-op.

The fact that xarray makes it about 100x slower is a real killer. It seems from this conversation that it might be hard to workaround

```python import xarray as xr import numpy as np n = np.zeros(shape=(1024, 1024)) x = xr.DataArray(n, dims=('y', 'x')) the_slice = np.s_[256:512, 256:512] %timeit n[the_slice] %timeit x[the_slice] 186 ns ± 0.778 ns per loop (mean ± std. dev. of 7 runs, 10000000 loops each) 70.3 µs ± 593 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each) ```
{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Performance: numpy indexes small amounts of data 1000 faster than xarray 416962458
539352070 https://github.com/pydata/xarray/issues/2799#issuecomment-539352070 https://api.github.com/repos/pydata/xarray/issues/2799 MDEyOklzc3VlQ29tbWVudDUzOTM1MjA3MA== ashwinvis 9155111 2019-10-08T06:08:27Z 2019-10-08T06:08:48Z CONTRIBUTOR

I suspect system jitter in the profiling as the time for Dataset.isel went up. It would be useful to run sudo python -m pyperf system tune before running profiler/benchmarks.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Performance: numpy indexes small amounts of data 1000 faster than xarray 416962458
539218376 https://github.com/pydata/xarray/issues/2799#issuecomment-539218376 https://api.github.com/repos/pydata/xarray/issues/2799 MDEyOklzc3VlQ29tbWVudDUzOTIxODM3Ng== crusaderky 6213168 2019-10-07T21:46:32Z 2019-10-07T21:53:33Z MEMBER

I tried playing around with pypy 3.6. Big fat disclaimer: I did not run any of the xarray unit tests. Expect trouble if you do.

1. ```bash

!/bin/bash

set -o errexit set -o pipefail set -o nounset set -o xtrace

tar -xvjf Downloads/pypy3.6-v7.1.1-linux64.tar.bz2 cd pypy3.6-v7.1.1-linux64/bin ./pypy3 -m ensurepip ./pip3.6 install -U pip wheel ./pip list | awk 'NR > 2 {print $1}' | grep -v greenlet | xargs ./pip install -U

sudo apt-get install libopenblas-dev gfortran

./pip install numpy pandas xarray ` 2. to work around https://bitbucket.org/pypy/pypy/issues/3087/collectionsabc-__init_subclass__-failure, editxarray/core/common.pyand deleteAttrAccessMixin.init_subclass``

  1. timeit is unreliable in pypy. I modified the benchmark as follows: ```python import time

import numpy as np import xarray as xr

shape = (10, 10, 10, 10) index = (0, 0, 0, 0) np_arr = np.ones(shape) arr = xr.DataArray(np_arr)

N = 10000

def bench_slice(obj): for _ in range(4): t0 = time.time() for _ in range(N): obj[index] t1 = time.time() t_ns = (t1 - t0) / N * 1e9 print(f"{t_ns:6.0f} ns {obj.class.name}")

bench_slice(arr) bench_slice(np_arr) ```

Benchmark outputs: CPython 3.7: 93496 ns DataArray 92732 ns DataArray 92560 ns DataArray 93427 ns DataArray 119 ns ndarray 121 ns ndarray 122 ns ndarray 119 ns ndarray PyPy 7.1 3.6: 113273 ns DataArray 38543 ns DataArray 34797 ns DataArray 39453 ns DataArray 386 ns ndarray 289 ns ndarray 329 ns ndarray 413 ns ndarray Big important reminder: all results are for a very small array. I would expect the gap between CPython and pypy to get narrower in % (both for numpy and xarray) as the array size gets larger and more time is spent in the pure C numpy code.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Performance: numpy indexes small amounts of data 1000 faster than xarray 416962458
539100243 https://github.com/pydata/xarray/issues/2799#issuecomment-539100243 https://api.github.com/repos/pydata/xarray/issues/2799 MDEyOklzc3VlQ29tbWVudDUzOTEwMDI0Mw== max-sixty 5635139 2019-10-07T16:39:54Z 2019-10-07T16:39:54Z MEMBER

Great analysis, thanks

Do we have any idea of which of those lines are offending? I used a tool line_profiler a while ago, but maybe we know already (I'm guessing it's the two _replace_with_new_dims lines?)

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Performance: numpy indexes small amounts of data 1000 faster than xarray 416962458
538570946 https://github.com/pydata/xarray/issues/2799#issuecomment-538570946 https://api.github.com/repos/pydata/xarray/issues/2799 MDEyOklzc3VlQ29tbWVudDUzODU3MDk0Ng== crusaderky 6213168 2019-10-04T21:48:18Z 2019-10-06T21:56:58Z MEMBER

I simplified the benchmark: ```python from itertools import product

import numpy as np import xarray as xr

shape = (10, 10, 10, 10) index = (0, 0, 0, 0) np_arr = np.ones(shape) arr = xr.DataArray(np_arr) named_index = dict(zip(arr.dims, index))

print(index) print(named_index)

%timeit -n 1000 arr[index] %timeit -n 1000 arr.isel(**named_index) %timeit -n 1000 np_arr[index] (0, 0, 0, 0) {'dim_0': 0, 'dim_1': 0, 'dim_2': 0, 'dim_3': 0} 90.8 µs ± 5.12 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each) 88.5 µs ± 2.74 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each) 115 ns ± 6.71 ns per loop (mean ± std. dev. of 7 runs, 1000 loops each) python %%prun -s cumulative for _ in range(10000): arr[index] 5680003 function calls (5630003 primitive calls) in 1.890 seconds

Ordered by: cumulative time

ncalls tottime percall cumtime percall filename:lineno(function) 1 0.000 0.000 1.890 1.890 {built-in method builtins.exec} 1 0.009 0.009 1.890 1.890 <string>:1(<module>) 10000 0.011 0.000 1.881 0.000 dataarray.py:629(getitem) 10000 0.030 0.000 1.801 0.000 dataarray.py:988(isel) 10000 0.084 0.000 1.567 0.000 dataset.py:1842(isel) 10000 0.094 0.000 0.570 0.000 dataset.py:1746(_validate_indexers) 10000 0.029 0.000 0.375 0.000 variable.py:960(isel) 10000 0.013 0.000 0.319 0.000 variable.py:666(getitem) 20000 0.014 0.000 0.251 0.000 dataset.py:918(_replace_with_new_dims) 50000 0.028 0.000 0.245 0.000 variable.py:272(init) 10000 0.035 0.000 0.211 0.000 variable.py:487(_broadcast_indexes) 1140000/1100000 0.100 0.000 0.168 0.000 {built-in method builtins.isinstance} 10000 0.050 0.000 0.157 0.000 dataset.py:1802(_get_indexers_coords_and_indexes) 20000 0.025 0.000 0.153 0.000 dataset.py:868(_replace) 50000 0.085 0.000 0.152 0.000 variable.py:154(as_compatible_data) ```

Time breakdown:

Total | 1.881 -- | -- DataArray.__getitem__ | 0.080 DataArray.isel (_to_temp_dataset roundtrip) | 0.234 Dataset.isel | 0.622 Dataset._validate_indexers | 0.570 Variable.isel | 0.056 Variable.__getitem__ | 0.319

I can spot a few low-hanging fruits there: - huge amount of time spent on _validate_indexers - Why is variable__init__ being called 5 times?!? I expected 0. - The bench strongly hints at the fact that we're creating on the fly dummy IndexVariables - We're casting the DataArray to a Dataset, converting the positional index to a dict, then converting it back to positional for each variable. Maybe it's a good idea to rewrite DataArray.sel/isel so that they don't use _to_temp_dataset?

So in short while I don't think we can feasibly close the order-of-magnitude gap (800x) with numpy, I suspect we could get at least a 5x speedup here.

{
    "total_count": 5,
    "+1": 5,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Performance: numpy indexes small amounts of data 1000 faster than xarray 416962458
538791352 https://github.com/pydata/xarray/issues/2799#issuecomment-538791352 https://api.github.com/repos/pydata/xarray/issues/2799 MDEyOklzc3VlQ29tbWVudDUzODc5MTM1Mg== crusaderky 6213168 2019-10-06T21:47:20Z 2019-10-06T21:48:48Z MEMBER

After #3375:

1.371 | TOTAL -- | -- 0.082 | DataArray.__getitem__ 0.217 | DataArray.isel (_to_temp_dataset roundtrip) 0.740 | Dataset.isel 0.056 | Variable.isel 0.276 | Variable.__getitem__

The offending lines in Dataset.isel are these, and I strongly suspect they are improvable:

https://github.com/pydata/xarray/blob/4254b4af33843f711459e5242018cd1d678ad3a0/xarray/core/dataset.py#L1922-L1930

{
    "total_count": 1,
    "+1": 1,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Performance: numpy indexes small amounts of data 1000 faster than xarray 416962458
538790722 https://github.com/pydata/xarray/issues/2799#issuecomment-538790722 https://api.github.com/repos/pydata/xarray/issues/2799 MDEyOklzc3VlQ29tbWVudDUzODc5MDcyMg== crusaderky 6213168 2019-10-06T21:38:44Z 2019-10-06T21:38:44Z MEMBER

All those integer indexes were cast into Variables. #3375 stops that.

{
    "total_count": 1,
    "+1": 1,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Performance: numpy indexes small amounts of data 1000 faster than xarray 416962458
538366978 https://github.com/pydata/xarray/issues/2799#issuecomment-538366978 https://api.github.com/repos/pydata/xarray/issues/2799 MDEyOklzc3VlQ29tbWVudDUzODM2Njk3OA== ashwinvis 9155111 2019-10-04T11:57:10Z 2019-10-04T11:57:10Z CONTRIBUTOR

At first sight it looks somewhat like a hybrid between Cython (for the ahead-of-time transpiling to C++) and numba (for having python-compatible syntax).

Not really. Pythran always releases the GIL and does a bunch of optimizations between transpilation and compilations.

A good approach would be try out different compilers and see what performance is obtained, without losing readability (https://github.com/pydata/xarray/issues/2799#issuecomment-469444519). See scikit-image/scikit-image/issues/4199 where the package transonic was being experimentally tested to replace Cython-only code with python code + type hints. As a bonus, you get to switch between Cython, Pythran and Numba,

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Performance: numpy indexes small amounts of data 1000 faster than xarray 416962458
529578839 https://github.com/pydata/xarray/issues/2799#issuecomment-529578839 https://api.github.com/repos/pydata/xarray/issues/2799 MDEyOklzc3VlQ29tbWVudDUyOTU3ODgzOQ== crusaderky 6213168 2019-09-09T17:15:08Z 2019-09-09T17:15:08Z MEMBER

Pythran supports Python 2.7 and also has a decent Python 3 support. [...] Pythran now supports Python3 and can be installed as a regular Python3 program. Note however that Python3 support is still in early stage and compilation failure may happen. Report them!

This is not a great start :(

It's the first time I hear about Pythran. At first sight it looks somewhat like a hybrid between Cython (for the ahead-of-time transpiling to C++) and numba (for having python-compatible syntax).

That said, I didn't see anything that hints at potential speedups on the python boilerplate code.

I already had experience with compiling pure-python code (tight __iter__ methods) with Cython, and got around 30% performance boost which - while nothing to scoff at - is not life-changing either.

This said, I'd have to spend more time on it to get a more informed opinion.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Performance: numpy indexes small amounts of data 1000 faster than xarray 416962458
529569885 https://github.com/pydata/xarray/issues/2799#issuecomment-529569885 https://api.github.com/repos/pydata/xarray/issues/2799 MDEyOklzc3VlQ29tbWVudDUyOTU2OTg4NQ== DerWeh 22542812 2019-09-09T16:53:20Z 2019-09-09T16:53:20Z NONE

It might be interesting to see, if pythran is an alternative to Cython. It seems like it handles high level numpy quite well, and would retain the readability of Python. Of course, it has its own issues...

But it seems like other libraries like e.g. scikit-image made some good experience with it.

Sadly I can't be of much help, as I lack experience (and most importantly time).

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Performance: numpy indexes small amounts of data 1000 faster than xarray 416962458
469898607 https://github.com/pydata/xarray/issues/2799#issuecomment-469898607 https://api.github.com/repos/pydata/xarray/issues/2799 MDEyOklzc3VlQ29tbWVudDQ2OTg5ODYwNw== max-sixty 5635139 2019-03-05T23:16:43Z 2019-03-05T23:16:43Z MEMBER

Cython + memoryviews isn't quite the right comparison here.

Right, tbc, I'm only referring to the top two lines of the pasted benchmark; i.e. once we enter python (even if only to access a numpy array) we're already losing a lot of the speed relative to the loop staying in C / Cython. So even if xarray were a python front-end to a C++ library, it still wouldn't be competitive if performance were paramount. ...unless pypy sped that up; I'd be v interested to see.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Performance: numpy indexes small amounts of data 1000 faster than xarray 416962458
469869298 https://github.com/pydata/xarray/issues/2799#issuecomment-469869298 https://api.github.com/repos/pydata/xarray/issues/2799 MDEyOklzc3VlQ29tbWVudDQ2OTg2OTI5OA== shoyer 1217238 2019-03-05T21:43:18Z 2019-03-05T21:43:32Z MEMBER

Cython + memoryviews isn't quite the right comparison here. I'm sure ordering here is correct, but relative magnitude of the performance difference should be smaller.

Xarray's core is bottlenecked on: 1. Overhead of abstraction with normal Python operations (e.g., function calls) in non-numeric code (all the heavy numerics is offloaded to NumPy or pandas). 2. The dynamic nature of our APIs, which means we need to do lots of type checking. Notice how high up builtins.isinstance appears in that performance profile!

C++ offers very low-cost abstraction but dynamism is still slow. Even then, compilers are much better at speeding up tight numeric loops than complex domain logic.

As a point of reference, it would be interesting to see these performance numbers running pypy, which I think should be able to handle everything in xarray. You'll note that pypy is something like 7x faster than CPython in their benchmark suite, which I suspect is closer to what we'd see if we wrote xarray's core in a language like C++, e.g., as Python interface to xframe.

{
    "total_count": 1,
    "+1": 1,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Performance: numpy indexes small amounts of data 1000 faster than xarray 416962458
469861382 https://github.com/pydata/xarray/issues/2799#issuecomment-469861382 https://api.github.com/repos/pydata/xarray/issues/2799 MDEyOklzc3VlQ29tbWVudDQ2OTg2MTM4Mg== max-sixty 5635139 2019-03-05T21:19:31Z 2019-03-05T21:19:31Z MEMBER

To put the relative speed of numpy access into perspective, I found this insightful: https://jakevdp.github.io/blog/2012/08/08/memoryview-benchmarks/ (it's now a few years out of date, but I think the fundamentals still stand)

Pasted from there:

Summary Here are the timing results we've seen above:

Python + numpy: 6510 ms Cython + numpy: 668 ms Cython + memviews (slicing): 22 ms Cython + raw pointers: 2.47 ms Cython + memviews (no slicing): 2.45 ms

So if we're running an inner loop on an array, accessing it using numpy in python is an order of magnitude slower than accessing it using numpy in C (and that's an order of magnitude slower than using a slice, and that's an order of magnitude slower than using raw pointers)

So - let's definitely speed xarray up (your benchmarks are excellent, thank you again, and I think you're right there are opportunities for significant increases). But where speed is paramount above all else, we shouldn't use any access in python, let alone the niceties of xarray access.

{
    "total_count": 1,
    "+1": 1,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Performance: numpy indexes small amounts of data 1000 faster than xarray 416962458
469451210 https://github.com/pydata/xarray/issues/2799#issuecomment-469451210 https://api.github.com/repos/pydata/xarray/issues/2799 MDEyOklzc3VlQ29tbWVudDQ2OTQ1MTIxMA== nbren12 1386642 2019-03-04T22:40:07Z 2019-03-04T22:40:07Z CONTRIBUTOR

Sure, I've been using that as a workaround as well. Unfortunately, that approach throws away all the nice info (e.g. metadata, coordinate) that xarray objects have and requires duplicating much of xarray's indexing logic.

{
    "total_count": 3,
    "+1": 3,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Performance: numpy indexes small amounts of data 1000 faster than xarray 416962458
469449165 https://github.com/pydata/xarray/issues/2799#issuecomment-469449165 https://api.github.com/repos/pydata/xarray/issues/2799 MDEyOklzc3VlQ29tbWVudDQ2OTQ0OTE2NQ== max-sixty 5635139 2019-03-04T22:33:03Z 2019-03-04T22:33:03Z MEMBER

You can always use xarray to process the data, and then extract the underlying array (da.values) for passing into something expecting an numpy array / for running fast(ish) loops (we do this frequently).

{
    "total_count": 1,
    "+1": 1,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Performance: numpy indexes small amounts of data 1000 faster than xarray 416962458
469447632 https://github.com/pydata/xarray/issues/2799#issuecomment-469447632 https://api.github.com/repos/pydata/xarray/issues/2799 MDEyOklzc3VlQ29tbWVudDQ2OTQ0NzYzMg== nbren12 1386642 2019-03-04T22:27:57Z 2019-03-04T22:27:57Z CONTRIBUTOR

@max-sixty I tend to agree this use case could be outside of the scope of xarray. It sounds like significant progress might require re-implementing core xarray objects in C/Cython. Without more than 10x improvement, I would probably just continue using numpy arrays.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Performance: numpy indexes small amounts of data 1000 faster than xarray 416962458
469445483 https://github.com/pydata/xarray/issues/2799#issuecomment-469445483 https://api.github.com/repos/pydata/xarray/issues/2799 MDEyOklzc3VlQ29tbWVudDQ2OTQ0NTQ4Mw== max-sixty 5635139 2019-03-04T22:20:58Z 2019-03-04T22:20:58Z MEMBER

Thanks for the benchmarks @nbren12, and for the clear explanation @shoyer

While we could do some performance work on that loop, I think we're likely to see a material change by enabling the external library to access directly from the array, without a looped python call. That's consistent with the ideas @jhamman had a few days ago.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Performance: numpy indexes small amounts of data 1000 faster than xarray 416962458
469444519 https://github.com/pydata/xarray/issues/2799#issuecomment-469444519 https://api.github.com/repos/pydata/xarray/issues/2799 MDEyOklzc3VlQ29tbWVudDQ2OTQ0NDUxOQ== shoyer 1217238 2019-03-04T22:17:58Z 2019-03-04T22:17:58Z MEMBER

To be clear, pull requests improving performance (without significantly loss of readability) would be very welcome. Be sure to include a new benchmark in our benchmark suite.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Performance: numpy indexes small amounts of data 1000 faster than xarray 416962458
469439957 https://github.com/pydata/xarray/issues/2799#issuecomment-469439957 https://api.github.com/repos/pydata/xarray/issues/2799 MDEyOklzc3VlQ29tbWVudDQ2OTQzOTk1Nw== shoyer 1217238 2019-03-04T22:03:37Z 2019-03-04T22:16:49Z MEMBER

While python will always be slower than C when iterating over an array in this fashion, I would hope that xarray could be nearly as fast as numpy. I am not sure what the best way to improve this is though.

I'm sure it's possible to optimize this significantly, but short of rewriting this logic in a lower level language it's pretty much impossible to match the speed of NumPy.

This benchmark might give some useful context: ``` def dummy_isel(args, *kwargs): pass

def index_dummy(named_indices, arr): for named_index in named_indices: dummy_isel(arr, **named_index) %%timeit -n 10 index_dummy(named_indices, arr) ```

On my machine, this is already twice as slow as your NumPy benchmark (497 µs vs 251 µs) , and all it's doing is parsing *args and **kwargs! Every Python function/method call involving keyword arguments adds about 0.5 ns of overhead, because the highly optimized dict is (relatively) slow compared to positional arguments. In my experience it is almost impossible to get the overhead of a Python function call below a few microseconds.

Right now we're at about 130 µs per indexing operation. In the best case, we might make this 10x faster but even that would be quite challenging, e.g., consider that even creating a DataArray takes about 20 µs.

{
    "total_count": 1,
    "+1": 1,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Performance: numpy indexes small amounts of data 1000 faster than xarray 416962458
469443856 https://github.com/pydata/xarray/issues/2799#issuecomment-469443856 https://api.github.com/repos/pydata/xarray/issues/2799 MDEyOklzc3VlQ29tbWVudDQ2OTQ0Mzg1Ng== nbren12 1386642 2019-03-04T22:15:49Z 2019-03-04T22:15:49Z CONTRIBUTOR

Thanks so much @shoyer. I didn't realize there was that much overhead for a single function call. OTOH, 2x slower than numpy would be way better than 1000x.

After looking at the profiling info more, I tend to agree with your 10x maximum speed-up. A couple of particularly slow functions (e.g. Dataset._validate_indexers) account for about 75% of run time. However, the remaining 25% is split across several other pure python routines.

{
    "total_count": 1,
    "+1": 1,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Performance: numpy indexes small amounts of data 1000 faster than xarray 416962458
469394020 https://github.com/pydata/xarray/issues/2799#issuecomment-469394020 https://api.github.com/repos/pydata/xarray/issues/2799 MDEyOklzc3VlQ29tbWVudDQ2OTM5NDAyMA== nbren12 1386642 2019-03-04T19:45:11Z 2019-03-04T19:45:11Z CONTRIBUTOR

cc @rabernat

{
    "total_count": 1,
    "+1": 1,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Performance: numpy indexes small amounts of data 1000 faster than xarray 416962458

Advanced export

JSON shape: default, array, newline-delimited, object

CSV options:

CREATE TABLE [issue_comments] (
   [html_url] TEXT,
   [issue_url] TEXT,
   [id] INTEGER PRIMARY KEY,
   [node_id] TEXT,
   [user] INTEGER REFERENCES [users]([id]),
   [created_at] TEXT,
   [updated_at] TEXT,
   [author_association] TEXT,
   [body] TEXT,
   [reactions] TEXT,
   [performed_via_github_app] TEXT,
   [issue] INTEGER REFERENCES [issues]([id])
);
CREATE INDEX [idx_issue_comments_issue]
    ON [issue_comments] ([issue]);
CREATE INDEX [idx_issue_comments_user]
    ON [issue_comments] ([user]);
Powered by Datasette · Queries took 18.209ms · About: xarray-datasette
  • Sort ascending
  • Sort descending
  • Facet by this
  • Hide this column
  • Show all columns
  • Show not-blank rows