issues: 1987770706
This data as json
id | node_id | number | title | user | state | locked | assignee | milestone | comments | created_at | updated_at | closed_at | author_association | active_lock_reason | draft | pull_request | body | reactions | performed_via_github_app | state_reason | repo | type |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
1987770706 | I_kwDOAMm_X852evlS | 8440 | nD integer indexing on dask data is very slow | 13662783 | closed | 0 | 2 | 2023-11-10T14:47:08Z | 2023-11-12T04:56:23Z | 2023-11-12T04:56:22Z | CONTRIBUTOR | What happened?I ran into a situation where I was indexing with a 2D integer array into some chunked netCDF data. This indexing operation is extremely slow. Using a flat 1D index instead is as fast as expected. What did you expect to happen?I would expect indexing on dask data to be very quick since the work is delayed, and indeed it is so in the 1D case. However, the 2D case is very slow -- slower than actually doing the all the work with numpy arrays! Minimal Complete Verifiable Example```Python import dask.array import numpy as np import xarray as xr %%da = xr.DataArray( data=np.random.rand(100, 1_000_000), dims=("time", "x"), ) dask_da = xr.DataArray( data=dask.array.from_array(da.to_numpy(), chunks=(1, 1_000_000)), dims=("time", "x"), ) indexer = np.random.randint(0, 1_000_000, size=100_000) indexer2d = xr.DataArray( data=indexer.reshape((4, -1)), dims=("a", "b"), ) %%%timeit da.isel(x=indexer) # 162 ms %timeit da.isel(x=indexer2d) # 164 ms %timeit dask_da.isel(x=indexer) # 5.3 ms %timeit dask_da.isel(x=indexer2d) # 860 ms according to timeit, but 6 to 14 (!) seconds in interactive use ``` MVCE confirmation
Relevant log outputNo response Anything else we need to know?No response Environment
INSTALLED VERSIONS
------------------
commit: None
python: 3.10.12 | packaged by conda-forge | (main, Jun 23 2023, 22:34:57) [MSC v.1936 64 bit (AMD64)]
python-bits: 64
OS: Windows
OS-release: 10
machine: AMD64
processor: Intel64 Family 6 Model 151 Stepping 2, GenuineIntel
byteorder: little
LC_ALL: None
LANG: None
LOCALE: ('English_Netherlands', '1252')
libhdf5: 1.14.2
libnetcdf: 4.9.2
xarray: 2023.10.2.dev31+ge5d163a8.d20231110
pandas: 2.1.2
numpy: 1.26.0
scipy: 1.11.3
netCDF4: 1.6.5
pydap: None
h5netcdf: None
h5py: None
Nio: None
zarr: None
cftime: 1.6.3
nc_time_axis: None
PseudoNetCDF: None
iris: None
bottleneck: None
dask: 2023.10.1
distributed: 2023.10.1
matplotlib: 3.8.1
cartopy: None
seaborn: None
numbagg: None
fsspec: 2023.10.0
cupy: None
pint: None
sparse: None
flox: None
numpy_groupies: None
setuptools: 68.1.2
pip: 23.2.1
conda: 23.3.1
pytest: None
mypy: None
IPython: 8.17.2
sphinx: None
|
{ "url": "https://api.github.com/repos/pydata/xarray/issues/8440/reactions", "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
completed | 13221727 | issue |