issue_comments: 706548513
This data as json
| html_url | issue_url | id | node_id | user | created_at | updated_at | author_association | body | reactions | performed_via_github_app | issue |
|---|---|---|---|---|---|---|---|---|---|---|---|
| https://github.com/pydata/xarray/issues/4498#issuecomment-706548513 | https://api.github.com/repos/pydata/xarray/issues/4498 | 706548513 | MDEyOklzc3VlQ29tbWVudDcwNjU0ODUxMw== | 145117 | 2020-10-10T13:21:19Z | 2020-10-10T13:21:19Z | CONTRIBUTOR | "performance" is a good tag. My actual use case is a dataset with 500,000 timestamps and 15 variables (10 minute weather station for a decade). In this case, pandas takes 0.03 seconds, and xarray takes 200 seconds. 4 orders of magnitude. Should I change the title to reflect the larger difference in performance? Here is that MWE: ```python import numpy as np import xarray as xr import pandas as pd import time size = 500000 times = pd.date_range('2000-01-01', periods=size, freq="10Min") ds = xr.Dataset({ 'foo': xr.DataArray( data = np.random.random(size), dims = ['time'], coords = {'time': times} )}) for v in 'abcdefghijelm': ds[v] = (('time'), np.random.random(size)) start = time.time() ds_r = ds.resample({'time':"1H"}).mean() print('xr', str(time.time() - start)) start = time.time() ds_r = ds.to_dataframe().resample("1H").mean() print('pd', str(time.time() - start)) ``` Result:
The strange thing here is if I drop the
But every 4th or 5th time that I run this, I get this:
This is repeatable. I've Run this code 100s of times now, and every 4th or 5th run it takes 10x. Nothing else is going on on my computer. |
{
"total_count": 0,
"+1": 0,
"-1": 0,
"laugh": 0,
"hooray": 0,
"confused": 0,
"heart": 0,
"rocket": 0,
"eyes": 0
} |
718436141 |