home / github

Menu
  • GraphQL API
  • Search all tables

issues

Table actions
  • GraphQL API for issues

1 row where type = "issue" and user = 15956441 sorted by updated_at descending

✎ View and edit SQL

This data as json, CSV (advanced)

Suggested facets: created_at (date), updated_at (date), closed_at (date)

type 1

  • issue · 1 ✖

state 1

  • closed 1

repo 1

  • xarray 1
id node_id number title user state locked assignee milestone comments created_at updated_at ▲ closed_at author_association active_lock_reason draft pull_request body reactions performed_via_github_app state_reason repo type
330918967 MDU6SXNzdWUzMzA5MTg5Njc= 2223 DataArray.interp() : poor performance e-roux 15956441 closed 0     6 2018-06-09T21:01:30Z 2020-05-25T20:02:37Z 2020-05-25T20:02:37Z CONTRIBUTOR      

Code Sample

Hello,

I performed a quick comparison of the newly introduced method ìnterp()` with an adapter (draft) to the sdf (scientific data format) library: https://gist.github.com/gwin-zegal/b955c3ef63f5ad51eec6329dd2e620be#file-array_sdf_interp-py

Code for a micro comparison (2D array) in python (include the above gist first):

```python

arr = xr.DataArray(np.sort(np.sort(np.random.RandomState(123).rand(30,4), axis=0), axis=1), coords=[('tension', np.arange(10, 40)), ('resistance', np.linspace(100, 500, 4))])

res = {'xarray': [], 'c_sdf' : []} x = np.logspace(1, 4, num=10, dtype=np.int16) for size in x: new_tension = arr.tension[0].data + np.random.random_sample(size=size) * (arr.tension[-1].data - arr.tension[0].data) new_resistance = arr.resistance[0].data + np.random.random_sample(size=size) * (arr.resistance[-1].data - arr.resistance[0].data)

interp_xr = %timeit -qo arr.interp({'tension': new_tension, 'resistance': new_resistance})
res['xarray'].append(interp_xr)

interp_c_sdf = %timeit -qo arr(new_tension, new_resistance)
res['c_sdf'].append(interp_c_sdf)

```

Problem description

The time spent for array.interp()is growing exponentially... over two 2min (xarray internal interp) on my old machine compared to 9ms (C-SDF wrapper) for 10_000 interpolations.

The C-SDF code is slow (a copy of the array is performed and algorithms not so optimized), but xarray implementation is not usable in daily life on my machine!

{'xarray': [<TimeitResult : 6.46 ms ± 223 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)>, <TimeitResult : 6.46 ms ± 88.5 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)>, <TimeitResult : 6.99 ms ± 141 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)>, <TimeitResult : 8.91 ms ± 52 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)>, <TimeitResult : 19.8 ms ± 183 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)>, <TimeitResult : 112 ms ± 638 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)>, <TimeitResult : 584 ms ± 19.2 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)>, <TimeitResult : 2.63 s ± 43.4 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)>, <TimeitResult : 15.5 s ± 147 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)>, <TimeitResult : 2min 23s ± 18.7 s per loop (mean ± std. dev. of 7 runs, 1 loop each)>], 'c_sdf': [<TimeitResult : 1.08 ms ± 7.96 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)>, <TimeitResult : 1.09 ms ± 7.91 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)>, <TimeitResult : 1.1 ms ± 12.9 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)>, <TimeitResult : 1.13 ms ± 9.95 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)>, <TimeitResult : 1.19 ms ± 7.76 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)>, <TimeitResult : 1.32 ms ± 9.47 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)>, <TimeitResult : 1.59 ms ± 10.5 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)>, <TimeitResult : 2.19 ms ± 31.8 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)>, <TimeitResult : 3.51 ms ± 34.9 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)>, <TimeitResult : 9.27 ms ± 307 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)>]}

Performance issue on my machine or is it confirmed by others?

Output of xr.show_versions()

INSTALLED VERSIONS ------------------ commit: None python: 3.6.5.final.0 python-bits: 64 OS: Darwin OS-release: 15.6.0 machine: x86_64 processor: i386 byteorder: little LC_ALL: None LANG: fr_FR.UTF-8 LOCALE: fr_FR.UTF-8 xarray: 0.10.7 pandas: 0.23.0 numpy: 1.14.3 scipy: 1.1.0 netCDF4: None h5netcdf: None h5py: 2.8.0 Nio: None zarr: None bottleneck: 1.2.1 cyordereddict: None dask: 0.17.5 distributed: 1.21.8 matplotlib: 2.2.2 cartopy: 0.16.0 seaborn: 0.8.1 setuptools: 39.2.0 pip: 10.0.1 conda: 4.5.4 pytest: 3.4.1 IPython: 6.4.0 sphinx: 1.7.5
{
    "url": "https://api.github.com/repos/pydata/xarray/issues/2223/reactions",
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  completed xarray 13221727 issue

Advanced export

JSON shape: default, array, newline-delimited, object

CSV options:

CREATE TABLE [issues] (
   [id] INTEGER PRIMARY KEY,
   [node_id] TEXT,
   [number] INTEGER,
   [title] TEXT,
   [user] INTEGER REFERENCES [users]([id]),
   [state] TEXT,
   [locked] INTEGER,
   [assignee] INTEGER REFERENCES [users]([id]),
   [milestone] INTEGER REFERENCES [milestones]([id]),
   [comments] INTEGER,
   [created_at] TEXT,
   [updated_at] TEXT,
   [closed_at] TEXT,
   [author_association] TEXT,
   [active_lock_reason] TEXT,
   [draft] INTEGER,
   [pull_request] TEXT,
   [body] TEXT,
   [reactions] TEXT,
   [performed_via_github_app] TEXT,
   [state_reason] TEXT,
   [repo] INTEGER REFERENCES [repos]([id]),
   [type] TEXT
);
CREATE INDEX [idx_issues_repo]
    ON [issues] ([repo]);
CREATE INDEX [idx_issues_milestone]
    ON [issues] ([milestone]);
CREATE INDEX [idx_issues_assignee]
    ON [issues] ([assignee]);
CREATE INDEX [idx_issues_user]
    ON [issues] ([user]);
Powered by Datasette · Queries took 22.869ms · About: xarray-datasette