issues: 560860376
This data as json
id | node_id | number | title | user | state | locked | assignee | milestone | comments | created_at | updated_at | closed_at | author_association | active_lock_reason | draft | pull_request | body | reactions | performed_via_github_app | state_reason | repo | type |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
560860376 | MDU6SXNzdWU1NjA4NjAzNzY= | 3755 | Performance problem when doing computation between two arrays with discontinuous indexes | 33070178 | open | 0 | 0 | 2020-02-06T08:44:40Z | 2020-12-03T18:16:02Z | NONE | MCVE Code Sample```python import xarray as xr import numpy as np Creating arrayds = xr.Dataset() ds["longitude"] = np.arange(0.1,2000.1,1) ds["latitude"] = range(3000) ds["step"] = range(50) ds["field"] = (("step","latitude","longitude"),np.random.randn(50,3000,2000)) ds.to_netcdf("big_array.nc") Create another arrayds = xr.Dataset() ds["longitude"] = np.arange(500.1,600.1,1) # Coordinate are a continuous subset of the first array ds["latitude"] = np.arange(510,660) ds["id"] = range(50) ds["field"] = (("longitude","latitude","id"),np.random.randn(100,150,50)) ds.to_netcdf("slicing.nc") Create another array with "discontinuity" in longitude dimensionds = xr.Dataset() ds["longitude"] = list(np.arange(500.1,598.1,1)) +[622.1,640.1] ds["latitude"] = range(510,660) ds["id"] = range(10) ds["mask"] = (("longitude","latitude","id"),np.random.randn(100,150,10)) ds.to_netcdf("no_slicing.nc") Load the Three arraysda = xr.open_dataset("big_array.nc") db = xr.open_dataset("slicing.nc").isel(id=0) dc = xr.open_dataarray("no_slicing.nc").isel(id=0) %timeit da*db 32.3 ms ± 5.02 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)%timeit da*dc 2.13 s ± 15.4 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)def slicing_operation(dc,da): """ Slicing when knowing that dc is a subpart of da """ import operator min_lat = np.max([dc.latitude.values.min(),da.latitude.values.min()]) max_lat = np.min([dc.latitude.values.max(),da.latitude.values.max()]) index_lat_field = operator.and_(da.latitude >= min_lat,da.latitude <= max_lat) min_lon = np.max([dc.longitude.values.min(),da.longitude.values.min()]) max_lon = np.min([dc.longitude.values.max(),da.longitude.values.max()]) index_lon_field = operator.and_(da.longitude >= min_lon,da.longitude <= max_lon)
%timeit slicing_operation(dc,da) 43.9 ms ± 1.94 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)Checking if we get the same resultsres_0 = dc*da res = slicing_operation(dc,da) print(abs(res_0 - res).max()) ``` Problem DescriptionA performance problem occured when performing operation between da and dc.
Computing Indeed, when extending dc (such that its longitude are continuous when compared at da) the computation time take ~44 ms.
For me it seems that when doing Wall time: 1.89 s
Output of
|
{ "url": "https://api.github.com/repos/pydata/xarray/issues/3755/reactions", "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
13221727 | issue |