home / github / issue_comments

Menu
  • GraphQL API
  • Search all tables

issue_comments: 706548513

This data as json

html_url issue_url id node_id user created_at updated_at author_association body reactions performed_via_github_app issue
https://github.com/pydata/xarray/issues/4498#issuecomment-706548513 https://api.github.com/repos/pydata/xarray/issues/4498 706548513 MDEyOklzc3VlQ29tbWVudDcwNjU0ODUxMw== 145117 2020-10-10T13:21:19Z 2020-10-10T13:21:19Z CONTRIBUTOR

"performance" is a good tag. My actual use case is a dataset with 500,000 timestamps and 15 variables (10 minute weather station for a decade).

In this case, pandas takes 0.03 seconds, and xarray takes 200 seconds. 4 orders of magnitude. Should I change the title to reflect the larger difference in performance? Here is that MWE:

```python import numpy as np import xarray as xr import pandas as pd import time

size = 500000 times = pd.date_range('2000-01-01', periods=size, freq="10Min") ds = xr.Dataset({ 'foo': xr.DataArray( data = np.random.random(size), dims = ['time'], coords = {'time': times} )}) for v in 'abcdefghijelm': ds[v] = (('time'), np.random.random(size))

start = time.time() ds_r = ds.resample({'time':"1H"}).mean() print('xr', str(time.time() - start))

start = time.time() ds_r = ds.to_dataframe().resample("1H").mean() print('pd', str(time.time() - start)) ```

Result:

xr 202.2967929840088 pd 0.03381085395812988

The strange thing here is if I drop the .mean()'s, most of the time I see what you see.

: xr 0.03333306312561035 : pd 0.020237445831298828

But every 4th or 5th time that I run this, I get this:

: xr 0.8518760204315186 : pd 0.02686452865600586

This is repeatable. I've Run this code 100s of times now, and every 4th or 5th run it takes 10x. Nothing else is going on on my computer.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  718436141
Powered by Datasette · Queries took 0.676ms · About: xarray-datasette