id,node_id,number,title,user,state,locked,assignee,milestone,comments,created_at,updated_at,closed_at,author_association,active_lock_reason,draft,pull_request,body,reactions,performed_via_github_app,state_reason,repo,type 305702311,MDU6SXNzdWUzMDU3MDIzMTE=,1993,DataArray.rolling().mean() is way slower than it should be,1217238,closed,0,,,5,2018-03-15T20:10:22Z,2018-03-18T08:56:27Z,2018-03-18T08:56:27Z,MEMBER,,,,"#### Code Sample, a copy-pastable example if possible From @RayPalmerTech in https://github.com/kwgoodman/bottleneck/issues/186: ```python import numpy as np import pandas as pd import time import bottleneck as bn import xarray import matplotlib.pyplot as plt N = 30000200 # Number of datapoints Fs = 30000 # sample rate T=1/Fs # sample period duration = N/Fs # duration in s t = np.arange(0,duration,T) # time vector DATA = np.random.randn(N,)+5*np.sin(2*np.pi*0.01*t) # Example noisy sine data and window size w = 330000 def using_bottleneck_mean(data,width): return bn.move_mean(a=data,window=width,min_count = 1) def using_pandas_rolling_mean(data,width): return np.asarray(pd.DataFrame(data).rolling(window=width,center=True,min_periods=1).mean()).ravel() def using_xarray_mean(data,width): return xarray.DataArray(data,dims='x').rolling(x=width,min_periods=1, center=True).mean() start=time.time() A = using_bottleneck_mean(DATA,w) print('Bottleneck: ', time.time()-start, 's') start=time.time() B = using_pandas_rolling_mean(DATA,w) print('Pandas: ',time.time()-start,'s') start=time.time() C = using_xarray_mean(DATA,w) print('Xarray: ',time.time()-start,'s') ``` This results in: ``` Bottleneck: 0.0867006778717041 s Pandas: 0.563546895980835 s Xarray: 25.133142709732056 s ``` Somehow xarray is way slower than pandas and bottleneck, even though it's using bottleneck under the hood! #### Problem description Profiling shows that the majority of time is spent in `xarray.core.rolling.DataArrayRolling._setup_windows`. Monkey-patching that method with a dummy rectifies the issue: ``` xarray.core.rolling.DataArrayRolling._setup_windows = lambda *args: None ``` Now we obtain: ``` Bottleneck: 0.06775331497192383 s Pandas: 0.48262882232666016 s Xarray: 0.1723031997680664 s ``` The solution is to make setting up windows done lazily (in `__iter__`), instead of doing it in the constructor. #### Output of ``xr.show_versions()``
INSTALLED VERSIONS ------------------ commit: None python: 3.6.3.final.0 python-bits: 64 OS: Linux OS-release: 4.4.96+ machine: x86_64 processor: x86_64 byteorder: little LC_ALL: None LANG: en_US.UTF-8 LOCALE: en_US.UTF-8 xarray: 0.10.2 pandas: 0.22.0 numpy: 1.14.2 scipy: 0.19.1 netCDF4: None h5netcdf: None h5py: 2.7.1 Nio: None zarr: None bottleneck: 1.2.1 cyordereddict: None dask: None distributed: None matplotlib: 2.1.2 cartopy: None seaborn: 0.7.1 setuptools: 36.2.7 pip: 9.0.1 conda: None pytest: None IPython: 5.5.0 sphinx: None
","{""url"": ""https://api.github.com/repos/pydata/xarray/issues/1993/reactions"", ""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,completed,13221727,issue