github: issue_comments: 4 rows where author_association = "CONTRIBUTOR" and issue = 718436141 sorted by updated

4 rows where author_association = "CONTRIBUTOR" and issue = 718436141 sorted by updated_at descending

Search:

descending

id	html_url	issue_url	node_id	user	created_at	updated_at ▲	author_association	body	reactions	issue
706688398	https://github.com/pydata/xarray/issues/4498#issuecomment-706688398	https://api.github.com/repos/pydata/xarray/issues/4498	MDEyOklzc3VlQ29tbWVudDcwNjY4ODM5OA==	mankoff 145117	2020-10-11T11:11:47Z	2020-10-11T11:19:56Z	CONTRIBUTOR	Thanks for the clarification that this is a real issue not due to just my coding, and the suggestion to solve this elsewhere. For now I just use the fast Pandas version with this code: `python df_h = ds.to_dataframe().resample("1H").mean() # what we want (quickly), but in Pandas form vals = [xr.DataArray(data=df_h[c], dims=['time'], coords={'time':df_h.index}, attrs=ds[c].attrs) for c in df_h.columns] ds_h = xr.Dataset(dict(zip(df_h.columns,vals)), attrs=ds.attrs)`	{ "total_count": 1, "+1": 1, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	Resample is ~100x slower than Pandas resample; Speed is related to resample period (unlike Pandas) 718436141
706688498	https://github.com/pydata/xarray/issues/4498#issuecomment-706688498	https://api.github.com/repos/pydata/xarray/issues/4498	MDEyOklzc3VlQ29tbWVudDcwNjY4ODQ5OA==	mankoff 145117	2020-10-11T11:12:47Z	2020-10-11T11:12:47Z	CONTRIBUTOR	The linked issues refer to `groupby` not `resample` so this could stay open or be closed as a duplicate - I leave it to you to decide. Thank you for the assistance.	{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	Resample is ~100x slower than Pandas resample; Speed is related to resample period (unlike Pandas) 718436141
706548763	https://github.com/pydata/xarray/issues/4498#issuecomment-706548763	https://api.github.com/repos/pydata/xarray/issues/4498	MDEyOklzc3VlQ29tbWVudDcwNjU0ODc2Mw==	mankoff 145117	2020-10-10T13:23:24Z	2020-10-10T13:23:24Z	CONTRIBUTOR	The every 4th or 5th lag is not in the creation, it's in the `resample`: ```` +BEGIN_SRC jupyter-python :kernel ds :session bugreport for i in np.arange(25): start = time.time() ds_r = ds.resample({'time':"1H"}) print('xr', str(time.time() - start)) +END_SRC +RESULTS: +begin_example xr 0.04479050636291504 xr 0.047682762145996094 xr 0.8904871940612793 xr 0.05605506896972656 xr 0.0452876091003418 xr 0.0467374324798584 xr 0.8709239959716797 xr 0.05595755577087402 xr 0.046492576599121094 xr 0.04648017883300781 xr 0.045223236083984375 xr 0.8187246322631836 xr 0.05060911178588867 xr 0.04763054847717285 xr 0.8156075477600098 xr 0.055490970611572266 xr 0.047312259674072266 xr 0.04651069641113281 xr 0.8001837730407715 xr 0.05546212196350098 xr 0.04549074172973633 xr 0.04680013656616211 xr 0.04383039474487305 xr 0.7662224769592285 xr 0.04914355278015137 +end_example ````	{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	Resample is ~100x slower than Pandas resample; Speed is related to resample period (unlike Pandas) 718436141
706548513	https://github.com/pydata/xarray/issues/4498#issuecomment-706548513	https://api.github.com/repos/pydata/xarray/issues/4498	MDEyOklzc3VlQ29tbWVudDcwNjU0ODUxMw==	mankoff 145117	2020-10-10T13:21:19Z	2020-10-10T13:21:19Z	CONTRIBUTOR	"performance" is a good tag. My actual use case is a dataset with 500,000 timestamps and 15 variables (10 minute weather station for a decade). In this case, pandas takes 0.03 seconds, and xarray takes 200 seconds. 4 orders of magnitude. Should I change the title to reflect the larger difference in performance? Here is that MWE: ```python import numpy as np import xarray as xr import pandas as pd import time size = 500000 times = pd.date_range('2000-01-01', periods=size, freq="10Min") ds = xr.Dataset({ 'foo': xr.DataArray( data = np.random.random(size), dims = ['time'], coords = {'time': times} )}) for v in 'abcdefghijelm': ds[v] = (('time'), np.random.random(size)) start = time.time() ds_r = ds.resample({'time':"1H"}).mean() print('xr', str(time.time() - start)) start = time.time() ds_r = ds.to_dataframe().resample("1H").mean() print('pd', str(time.time() - start)) ``` Result: `xr 202.2967929840088 pd 0.03381085395812988` The strange thing here is if I drop the `.mean()`'s, most of the time I see what you see. `: xr 0.03333306312561035 : pd 0.020237445831298828` But every 4th or 5th time that I run this, I get this: `: xr 0.8518760204315186 : pd 0.02686452865600586` This is repeatable. I've Run this code 100s of times now, and every 4th or 5th run it takes 10x. Nothing else is going on on my computer.	{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	Resample is ~100x slower than Pandas resample; Speed is related to resample period (unlike Pandas) 718436141

Advanced export

JSON shape: default, array, newline-delimited, object

CREATE TABLE [issue_comments] (
   [html_url] TEXT,
   [issue_url] TEXT,
   [id] INTEGER PRIMARY KEY,
   [node_id] TEXT,
   [user] INTEGER REFERENCES [users]([id]),
   [created_at] TEXT,
   [updated_at] TEXT,
   [author_association] TEXT,
   [body] TEXT,
   [reactions] TEXT,
   [performed_via_github_app] TEXT,
   [issue] INTEGER REFERENCES [issues]([id])
);
CREATE INDEX [idx_issue_comments_issue]
    ON [issue_comments] ([issue]);
CREATE INDEX [idx_issue_comments_user]
    ON [issue_comments] ([user]);

issue_comments

4 rows where author_association = "CONTRIBUTOR" and issue = 718436141 sorted by updated_at descending

+BEGIN_SRC jupyter-python :kernel ds :session bugreport

+END_SRC

+RESULTS:

+begin_example

+end_example

Advanced export