issue_comments
4 rows where author_association = "CONTRIBUTOR", issue = 718436141 and user = 145117 sorted by updated_at descending
This data as json, CSV (advanced)
Suggested facets: reactions, created_at (date), updated_at (date)
issue 1
- Resample is ~100x slower than Pandas resample; Speed is related to resample period (unlike Pandas) · 4 ✖
id | html_url | issue_url | node_id | user | created_at | updated_at ▲ | author_association | body | reactions | performed_via_github_app | issue |
---|---|---|---|---|---|---|---|---|---|---|---|
706688398 | https://github.com/pydata/xarray/issues/4498#issuecomment-706688398 | https://api.github.com/repos/pydata/xarray/issues/4498 | MDEyOklzc3VlQ29tbWVudDcwNjY4ODM5OA== | mankoff 145117 | 2020-10-11T11:11:47Z | 2020-10-11T11:19:56Z | CONTRIBUTOR | Thanks for the clarification that this is a real issue not due to just my coding, and the suggestion to solve this elsewhere. For now I just use the fast Pandas version with this code:
|
{ "total_count": 1, "+1": 1, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
Resample is ~100x slower than Pandas resample; Speed is related to resample period (unlike Pandas) 718436141 | |
706688498 | https://github.com/pydata/xarray/issues/4498#issuecomment-706688498 | https://api.github.com/repos/pydata/xarray/issues/4498 | MDEyOklzc3VlQ29tbWVudDcwNjY4ODQ5OA== | mankoff 145117 | 2020-10-11T11:12:47Z | 2020-10-11T11:12:47Z | CONTRIBUTOR | The linked issues refer to |
{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
Resample is ~100x slower than Pandas resample; Speed is related to resample period (unlike Pandas) 718436141 | |
706548763 | https://github.com/pydata/xarray/issues/4498#issuecomment-706548763 | https://api.github.com/repos/pydata/xarray/issues/4498 | MDEyOklzc3VlQ29tbWVudDcwNjU0ODc2Mw== | mankoff 145117 | 2020-10-10T13:23:24Z | 2020-10-10T13:23:24Z | CONTRIBUTOR | The every 4th or 5th lag is not in the creation, it's in the ```` +BEGIN_SRC jupyter-python :kernel ds :session bugreportfor i in np.arange(25): start = time.time() ds_r = ds.resample({'time':"1H"}) print('xr', str(time.time() - start)) +END_SRC+RESULTS:+begin_examplexr 0.04479050636291504 xr 0.047682762145996094 xr 0.8904871940612793 xr 0.05605506896972656 xr 0.0452876091003418 xr 0.0467374324798584 xr 0.8709239959716797 xr 0.05595755577087402 xr 0.046492576599121094 xr 0.04648017883300781 xr 0.045223236083984375 xr 0.8187246322631836 xr 0.05060911178588867 xr 0.04763054847717285 xr 0.8156075477600098 xr 0.055490970611572266 xr 0.047312259674072266 xr 0.04651069641113281 xr 0.8001837730407715 xr 0.05546212196350098 xr 0.04549074172973633 xr 0.04680013656616211 xr 0.04383039474487305 xr 0.7662224769592285 xr 0.04914355278015137 +end_example```` |
{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
Resample is ~100x slower than Pandas resample; Speed is related to resample period (unlike Pandas) 718436141 | |
706548513 | https://github.com/pydata/xarray/issues/4498#issuecomment-706548513 | https://api.github.com/repos/pydata/xarray/issues/4498 | MDEyOklzc3VlQ29tbWVudDcwNjU0ODUxMw== | mankoff 145117 | 2020-10-10T13:21:19Z | 2020-10-10T13:21:19Z | CONTRIBUTOR | "performance" is a good tag. My actual use case is a dataset with 500,000 timestamps and 15 variables (10 minute weather station for a decade). In this case, pandas takes 0.03 seconds, and xarray takes 200 seconds. 4 orders of magnitude. Should I change the title to reflect the larger difference in performance? Here is that MWE: ```python import numpy as np import xarray as xr import pandas as pd import time size = 500000 times = pd.date_range('2000-01-01', periods=size, freq="10Min") ds = xr.Dataset({ 'foo': xr.DataArray( data = np.random.random(size), dims = ['time'], coords = {'time': times} )}) for v in 'abcdefghijelm': ds[v] = (('time'), np.random.random(size)) start = time.time() ds_r = ds.resample({'time':"1H"}).mean() print('xr', str(time.time() - start)) start = time.time() ds_r = ds.to_dataframe().resample("1H").mean() print('pd', str(time.time() - start)) ``` Result:
The strange thing here is if I drop the
But every 4th or 5th time that I run this, I get this:
This is repeatable. I've Run this code 100s of times now, and every 4th or 5th run it takes 10x. Nothing else is going on on my computer. |
{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
Resample is ~100x slower than Pandas resample; Speed is related to resample period (unlike Pandas) 718436141 |
Advanced export
JSON shape: default, array, newline-delimited, object
CREATE TABLE [issue_comments] ( [html_url] TEXT, [issue_url] TEXT, [id] INTEGER PRIMARY KEY, [node_id] TEXT, [user] INTEGER REFERENCES [users]([id]), [created_at] TEXT, [updated_at] TEXT, [author_association] TEXT, [body] TEXT, [reactions] TEXT, [performed_via_github_app] TEXT, [issue] INTEGER REFERENCES [issues]([id]) ); CREATE INDEX [idx_issue_comments_issue] ON [issue_comments] ([issue]); CREATE INDEX [idx_issue_comments_user] ON [issue_comments] ([user]);
user 1