home / github / issues

Menu
  • GraphQL API
  • Search all tables

issues: 718436141

This data as json

id node_id number title user state locked assignee milestone comments created_at updated_at closed_at author_association active_lock_reason draft pull_request body reactions performed_via_github_app state_reason repo type
718436141 MDU6SXNzdWU3MTg0MzYxNDE= 4498 Resample is ~100x slower than Pandas resample; Speed is related to resample period (unlike Pandas) 145117 closed 0     7 2020-10-09T21:37:20Z 2022-05-15T02:38:29Z 2022-05-15T02:38:29Z CONTRIBUTOR      

What happened:

I have a 10 minute frequency time series. When I resample to hourly it is slow. When I resample to daily it is fast. If I drop to Pandas and resample the speeds are ~100x faster than xarray, and also the same time regardless of the resample period. I've posted this to SO: https://stackoverflow.com/questions/64282393/

What you expected to happen:

I expect xarray to be within an order of magnitude speed of Pandas, not > 2 orders of magnitude slower.

Minimal Complete Verifiable Example:

```python import numpy as np import xarray as xr import pandas as pd import time

size = 10000 times = pd.date_range('2000-01-01', periods=size, freq="10Min") da = xr.DataArray(data = np.random.random(size), dims = ['time'], coords = {'time': times}, name='foo')

start = time.time() da_ = da.resample({'time':"1H"}).mean() print("1H", 'xr', str(time.time() - start))

start = time.time() da_ = da.to_dataframe().resample("1H").mean() print("1H", 'pd', str(time.time() - start), "\n")

start = time.time() da_ = da.resample({'time':"1D"}).mean() print("1D", 'xr', str(time.time() - start))

start = time.time() da_ = da.to_dataframe().resample("1D").mean() print("1D", 'pd', str(time.time() - start)) ```

Output/timings

: 1H xr 0.1761918067932129 : 1H pd 0.0021948814392089844 : : 1D xr 0.00958395004272461 : 1D pd 0.001646280288696289

Anything else we need to know?:

Environment:

Output of <tt>xr.show_versions()</tt> xr.show_versions() INSTALLED VERSIONS ------------------ commit: None python: 3.8.5 | packaged by conda-forge | (default, Aug 21 2020, 18:21:27) [GCC 7.5.0] python-bits: 64 OS: Linux OS-release: 5.4.0-48-generic machine: x86_64 processor: x86_64 byteorder: little LC_ALL: en_US.UTF-8 LANG: en_US.UTF-8 LOCALE: en_US.UTF-8 libhdf5: 1.10.6 libnetcdf: 4.7.4 xarray: 0.16.0 pandas: 1.1.1 numpy: 1.19.1 scipy: 1.5.2 netCDF4: 1.5.4 pydap: None h5netcdf: 0.8.1 h5py: 2.10.0 Nio: None zarr: None cftime: 1.2.1 nc_time_axis: None PseudoNetCDF: None rasterio: 1.1.5 cfgrib: None iris: None bottleneck: None dask: None distributed: None matplotlib: 3.3.1 cartopy: None seaborn: None numbagg: None pint: 0.15 setuptools: 49.6.0.post20200814 pip: 20.2.2 conda: None pytest: None IPython: 7.17.0 sphinx: None
{
    "url": "https://api.github.com/repos/pydata/xarray/issues/4498/reactions",
    "total_count": 4,
    "+1": 4,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  completed 13221727 issue

Links from other tables

  • 1 row from issues_id in issues_labels
  • 7 rows from issue in issue_comments
Powered by Datasette · Queries took 0.764ms · About: xarray-datasette