home / github / issues

Menu
  • GraphQL API
  • Search all tables

issues: 330918967

This data as json

id node_id number title user state locked assignee milestone comments created_at updated_at closed_at author_association active_lock_reason draft pull_request body reactions performed_via_github_app state_reason repo type
330918967 MDU6SXNzdWUzMzA5MTg5Njc= 2223 DataArray.interp() : poor performance 15956441 closed 0     6 2018-06-09T21:01:30Z 2020-05-25T20:02:37Z 2020-05-25T20:02:37Z CONTRIBUTOR      

Code Sample

Hello,

I performed a quick comparison of the newly introduced method ìnterp()` with an adapter (draft) to the sdf (scientific data format) library: https://gist.github.com/gwin-zegal/b955c3ef63f5ad51eec6329dd2e620be#file-array_sdf_interp-py

Code for a micro comparison (2D array) in python (include the above gist first):

```python

arr = xr.DataArray(np.sort(np.sort(np.random.RandomState(123).rand(30,4), axis=0), axis=1), coords=[('tension', np.arange(10, 40)), ('resistance', np.linspace(100, 500, 4))])

res = {'xarray': [], 'c_sdf' : []} x = np.logspace(1, 4, num=10, dtype=np.int16) for size in x: new_tension = arr.tension[0].data + np.random.random_sample(size=size) * (arr.tension[-1].data - arr.tension[0].data) new_resistance = arr.resistance[0].data + np.random.random_sample(size=size) * (arr.resistance[-1].data - arr.resistance[0].data)

interp_xr = %timeit -qo arr.interp({'tension': new_tension, 'resistance': new_resistance})
res['xarray'].append(interp_xr)

interp_c_sdf = %timeit -qo arr(new_tension, new_resistance)
res['c_sdf'].append(interp_c_sdf)

```

Problem description

The time spent for array.interp()is growing exponentially... over two 2min (xarray internal interp) on my old machine compared to 9ms (C-SDF wrapper) for 10_000 interpolations.

The C-SDF code is slow (a copy of the array is performed and algorithms not so optimized), but xarray implementation is not usable in daily life on my machine!

{'xarray': [<TimeitResult : 6.46 ms ± 223 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)>, <TimeitResult : 6.46 ms ± 88.5 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)>, <TimeitResult : 6.99 ms ± 141 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)>, <TimeitResult : 8.91 ms ± 52 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)>, <TimeitResult : 19.8 ms ± 183 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)>, <TimeitResult : 112 ms ± 638 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)>, <TimeitResult : 584 ms ± 19.2 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)>, <TimeitResult : 2.63 s ± 43.4 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)>, <TimeitResult : 15.5 s ± 147 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)>, <TimeitResult : 2min 23s ± 18.7 s per loop (mean ± std. dev. of 7 runs, 1 loop each)>], 'c_sdf': [<TimeitResult : 1.08 ms ± 7.96 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)>, <TimeitResult : 1.09 ms ± 7.91 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)>, <TimeitResult : 1.1 ms ± 12.9 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)>, <TimeitResult : 1.13 ms ± 9.95 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)>, <TimeitResult : 1.19 ms ± 7.76 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)>, <TimeitResult : 1.32 ms ± 9.47 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)>, <TimeitResult : 1.59 ms ± 10.5 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)>, <TimeitResult : 2.19 ms ± 31.8 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)>, <TimeitResult : 3.51 ms ± 34.9 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)>, <TimeitResult : 9.27 ms ± 307 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)>]}

Performance issue on my machine or is it confirmed by others?

Output of xr.show_versions()

INSTALLED VERSIONS ------------------ commit: None python: 3.6.5.final.0 python-bits: 64 OS: Darwin OS-release: 15.6.0 machine: x86_64 processor: i386 byteorder: little LC_ALL: None LANG: fr_FR.UTF-8 LOCALE: fr_FR.UTF-8 xarray: 0.10.7 pandas: 0.23.0 numpy: 1.14.3 scipy: 1.1.0 netCDF4: None h5netcdf: None h5py: 2.8.0 Nio: None zarr: None bottleneck: 1.2.1 cyordereddict: None dask: 0.17.5 distributed: 1.21.8 matplotlib: 2.2.2 cartopy: 0.16.0 seaborn: 0.8.1 setuptools: 39.2.0 pip: 10.0.1 conda: 4.5.4 pytest: 3.4.1 IPython: 6.4.0 sphinx: 1.7.5
{
    "url": "https://api.github.com/repos/pydata/xarray/issues/2223/reactions",
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  completed 13221727 issue

Links from other tables

  • 2 rows from issues_id in issues_labels
  • 6 rows from issue in issue_comments
Powered by Datasette · Queries took 0.865ms · About: xarray-datasette