home / github / issues

Menu
  • GraphQL API
  • Search all tables

issues: 365678022

This data as json

id node_id number title user state locked assignee milestone comments created_at updated_at closed_at author_association active_lock_reason draft pull_request body reactions performed_via_github_app state_reason repo type
365678022 MDU6SXNzdWUzNjU2NzgwMjI= 2452 DataArray.sel extremely slow 5308236 closed 0     5 2018-10-01T23:09:47Z 2018-10-02T16:15:00Z 2018-10-02T15:58:21Z NONE      

Problem description

.sel is an xarray method I use a lot and I would have expected it to fairly efficient. However, even on tiny DataArrays, it takes seconds.

Code Sample, a copy-pastable example if possible

```python import timeit

setup = """ import itertools import numpy as np import xarray as xr import string

a = list(string.printable) b = list(string.ascii_lowercase) d = xr.DataArray(np.random.rand(len(a), len(b)), coords={'a': a, 'b': b}, dims=['a', 'b']) d.load() """

run = """ for _a, _b in itertools.product(a, b): d.sel(a=_a, b=_b) """ running_times = timeit.repeat(run, setup, repeat=3, number=10) print("xarray", running_times) # e.g. [14.792144000064582, 15.19372400001157, 15.345327000017278] ```

Expected Output

I would have expected the above code to run in milliseconds. However, it takes over 10 seconds! Adding an additional d = d.stack(aa=['a'], bb=['b']) makes it even slower, about twice as slow.

For reference, a naive dict-indexing implementation in Python takes 0.01 seconds: ```python setup = """ import itertools import numpy as np import string

a = list(string.printable) b = list(string.ascii_lowercase)

d = np.random.rand(len(a), len(b)) indexers = {'a': {coord: index for (index, coord) in enumerate(a)}, 'b': {coord: index for (index, coord) in enumerate(b)}} """

run = """ for _a, _b in itertools.product(a, b): index_a, index_b = indexers['a'][_a], indexers['b'][_b] item = d[index_a][index_b] """ running_times = timeit.repeat(run, setup, repeat=3, number=10) print("dicts", running_times) # e.g. [0.015355999930761755, 0.01466800004709512, 0.014295000000856817] ```

Output of xr.show_versions()

INSTALLED VERSIONS ------------------ commit: None python: 3.7.0.final.0 python-bits: 64 OS: Linux OS-release: 4.4.0-17134-Microsoft machine: x86_64 processor: x86_64 byteorder: little LC_ALL: None LANG: None LOCALE: en_US.UTF-8 xarray: 0.10.8 pandas: 0.23.4 numpy: 1.15.1 scipy: 1.1.0 netCDF4: 1.4.1 h5netcdf: None h5py: None Nio: None zarr: None bottleneck: None cyordereddict: None dask: None distributed: None matplotlib: 2.2.3 cartopy: None seaborn: None setuptools: 40.2.0 pip: 10.0.1 conda: None pytest: 3.7.4 IPython: 6.5.0 sphinx: None

this is a follow-up from #2438

{
    "url": "https://api.github.com/repos/pydata/xarray/issues/2452/reactions",
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  completed 13221727 issue

Links from other tables

  • 0 rows from issues_id in issues_labels
  • 5 rows from issue in issue_comments
Powered by Datasette · Queries took 0.666ms · About: xarray-datasette