issues: 718436141
This data as json
id | node_id | number | title | user | state | locked | assignee | milestone | comments | created_at | updated_at | closed_at | author_association | active_lock_reason | draft | pull_request | body | reactions | performed_via_github_app | state_reason | repo | type |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
718436141 | MDU6SXNzdWU3MTg0MzYxNDE= | 4498 | Resample is ~100x slower than Pandas resample; Speed is related to resample period (unlike Pandas) | 145117 | closed | 0 | 7 | 2020-10-09T21:37:20Z | 2022-05-15T02:38:29Z | 2022-05-15T02:38:29Z | CONTRIBUTOR | What happened: I have a 10 minute frequency time series. When I resample to hourly it is slow. When I resample to daily it is fast. If I drop to Pandas and resample the speeds are ~100x faster than xarray, and also the same time regardless of the resample period. I've posted this to SO: https://stackoverflow.com/questions/64282393/ What you expected to happen: I expect xarray to be within an order of magnitude speed of Pandas, not > 2 orders of magnitude slower. Minimal Complete Verifiable Example: ```python import numpy as np import xarray as xr import pandas as pd import time size = 10000 times = pd.date_range('2000-01-01', periods=size, freq="10Min") da = xr.DataArray(data = np.random.random(size), dims = ['time'], coords = {'time': times}, name='foo') start = time.time() da_ = da.resample({'time':"1H"}).mean() print("1H", 'xr', str(time.time() - start)) start = time.time() da_ = da.to_dataframe().resample("1H").mean() print("1H", 'pd', str(time.time() - start), "\n") start = time.time() da_ = da.resample({'time':"1D"}).mean() print("1D", 'xr', str(time.time() - start)) start = time.time() da_ = da.to_dataframe().resample("1D").mean() print("1D", 'pd', str(time.time() - start)) ``` Output/timings
Anything else we need to know?: Environment: Output of <tt>xr.show_versions()</tt>xr.show_versions() INSTALLED VERSIONS ------------------ commit: None python: 3.8.5 | packaged by conda-forge | (default, Aug 21 2020, 18:21:27) [GCC 7.5.0] python-bits: 64 OS: Linux OS-release: 5.4.0-48-generic machine: x86_64 processor: x86_64 byteorder: little LC_ALL: en_US.UTF-8 LANG: en_US.UTF-8 LOCALE: en_US.UTF-8 libhdf5: 1.10.6 libnetcdf: 4.7.4 xarray: 0.16.0 pandas: 1.1.1 numpy: 1.19.1 scipy: 1.5.2 netCDF4: 1.5.4 pydap: None h5netcdf: 0.8.1 h5py: 2.10.0 Nio: None zarr: None cftime: 1.2.1 nc_time_axis: None PseudoNetCDF: None rasterio: 1.1.5 cfgrib: None iris: None bottleneck: None dask: None distributed: None matplotlib: 3.3.1 cartopy: None seaborn: None numbagg: None pint: 0.15 setuptools: 49.6.0.post20200814 pip: 20.2.2 conda: None pytest: None IPython: 7.17.0 sphinx: None |
{ "url": "https://api.github.com/repos/pydata/xarray/issues/4498/reactions", "total_count": 4, "+1": 4, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
completed | 13221727 | issue |