issue_comments: 706548513
This data as json
html_url | issue_url | id | node_id | user | created_at | updated_at | author_association | body | reactions | performed_via_github_app | issue |
---|---|---|---|---|---|---|---|---|---|---|---|
https://github.com/pydata/xarray/issues/4498#issuecomment-706548513 | https://api.github.com/repos/pydata/xarray/issues/4498 | 706548513 | MDEyOklzc3VlQ29tbWVudDcwNjU0ODUxMw== | 145117 | 2020-10-10T13:21:19Z | 2020-10-10T13:21:19Z | CONTRIBUTOR | "performance" is a good tag. My actual use case is a dataset with 500,000 timestamps and 15 variables (10 minute weather station for a decade). In this case, pandas takes 0.03 seconds, and xarray takes 200 seconds. 4 orders of magnitude. Should I change the title to reflect the larger difference in performance? Here is that MWE: ```python import numpy as np import xarray as xr import pandas as pd import time size = 500000 times = pd.date_range('2000-01-01', periods=size, freq="10Min") ds = xr.Dataset({ 'foo': xr.DataArray( data = np.random.random(size), dims = ['time'], coords = {'time': times} )}) for v in 'abcdefghijelm': ds[v] = (('time'), np.random.random(size)) start = time.time() ds_r = ds.resample({'time':"1H"}).mean() print('xr', str(time.time() - start)) start = time.time() ds_r = ds.to_dataframe().resample("1H").mean() print('pd', str(time.time() - start)) ``` Result:
The strange thing here is if I drop the
But every 4th or 5th time that I run this, I get this:
This is repeatable. I've Run this code 100s of times now, and every 4th or 5th run it takes 10x. Nothing else is going on on my computer. |
{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
718436141 |