home / github

Menu
  • GraphQL API
  • Search all tables

issue_comments

Table actions
  • GraphQL API for issue_comments

4 rows where author_association = "CONTRIBUTOR" and issue = 718436141 sorted by updated_at descending

✎ View and edit SQL

This data as json, CSV (advanced)

Suggested facets: reactions, created_at (date), updated_at (date)

user 1

  • mankoff 4

issue 1

  • Resample is ~100x slower than Pandas resample; Speed is related to resample period (unlike Pandas) · 4 ✖

author_association 1

  • CONTRIBUTOR · 4 ✖
id html_url issue_url node_id user created_at updated_at ▲ author_association body reactions performed_via_github_app issue
706688398 https://github.com/pydata/xarray/issues/4498#issuecomment-706688398 https://api.github.com/repos/pydata/xarray/issues/4498 MDEyOklzc3VlQ29tbWVudDcwNjY4ODM5OA== mankoff 145117 2020-10-11T11:11:47Z 2020-10-11T11:19:56Z CONTRIBUTOR

Thanks for the clarification that this is a real issue not due to just my coding, and the suggestion to solve this elsewhere. For now I just use the fast Pandas version with this code:

python df_h = ds.to_dataframe().resample("1H").mean() # what we want (quickly), but in Pandas form vals = [xr.DataArray(data=df_h[c], dims=['time'], coords={'time':df_h.index}, attrs=ds[c].attrs) for c in df_h.columns] ds_h = xr.Dataset(dict(zip(df_h.columns,vals)), attrs=ds.attrs)

{
    "total_count": 1,
    "+1": 1,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Resample is ~100x slower than Pandas resample; Speed is related to resample period (unlike Pandas) 718436141
706688498 https://github.com/pydata/xarray/issues/4498#issuecomment-706688498 https://api.github.com/repos/pydata/xarray/issues/4498 MDEyOklzc3VlQ29tbWVudDcwNjY4ODQ5OA== mankoff 145117 2020-10-11T11:12:47Z 2020-10-11T11:12:47Z CONTRIBUTOR

The linked issues refer to groupby not resample so this could stay open or be closed as a duplicate - I leave it to you to decide. Thank you for the assistance.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Resample is ~100x slower than Pandas resample; Speed is related to resample period (unlike Pandas) 718436141
706548763 https://github.com/pydata/xarray/issues/4498#issuecomment-706548763 https://api.github.com/repos/pydata/xarray/issues/4498 MDEyOklzc3VlQ29tbWVudDcwNjU0ODc2Mw== mankoff 145117 2020-10-10T13:23:24Z 2020-10-10T13:23:24Z CONTRIBUTOR

The every 4th or 5th lag is not in the creation, it's in the resample:

````

+BEGIN_SRC jupyter-python :kernel ds :session bugreport

for i in np.arange(25): start = time.time() ds_r = ds.resample({'time':"1H"}) print('xr', str(time.time() - start))

+END_SRC

+RESULTS:

+begin_example

xr 0.04479050636291504 xr 0.047682762145996094 xr 0.8904871940612793 xr 0.05605506896972656 xr 0.0452876091003418 xr 0.0467374324798584 xr 0.8709239959716797 xr 0.05595755577087402 xr 0.046492576599121094 xr 0.04648017883300781 xr 0.045223236083984375 xr 0.8187246322631836 xr 0.05060911178588867 xr 0.04763054847717285 xr 0.8156075477600098 xr 0.055490970611572266 xr 0.047312259674072266 xr 0.04651069641113281 xr 0.8001837730407715 xr 0.05546212196350098 xr 0.04549074172973633 xr 0.04680013656616211 xr 0.04383039474487305 xr 0.7662224769592285 xr 0.04914355278015137

+end_example

````

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Resample is ~100x slower than Pandas resample; Speed is related to resample period (unlike Pandas) 718436141
706548513 https://github.com/pydata/xarray/issues/4498#issuecomment-706548513 https://api.github.com/repos/pydata/xarray/issues/4498 MDEyOklzc3VlQ29tbWVudDcwNjU0ODUxMw== mankoff 145117 2020-10-10T13:21:19Z 2020-10-10T13:21:19Z CONTRIBUTOR

"performance" is a good tag. My actual use case is a dataset with 500,000 timestamps and 15 variables (10 minute weather station for a decade).

In this case, pandas takes 0.03 seconds, and xarray takes 200 seconds. 4 orders of magnitude. Should I change the title to reflect the larger difference in performance? Here is that MWE:

```python import numpy as np import xarray as xr import pandas as pd import time

size = 500000 times = pd.date_range('2000-01-01', periods=size, freq="10Min") ds = xr.Dataset({ 'foo': xr.DataArray( data = np.random.random(size), dims = ['time'], coords = {'time': times} )}) for v in 'abcdefghijelm': ds[v] = (('time'), np.random.random(size))

start = time.time() ds_r = ds.resample({'time':"1H"}).mean() print('xr', str(time.time() - start))

start = time.time() ds_r = ds.to_dataframe().resample("1H").mean() print('pd', str(time.time() - start)) ```

Result:

xr 202.2967929840088 pd 0.03381085395812988

The strange thing here is if I drop the .mean()'s, most of the time I see what you see.

: xr 0.03333306312561035 : pd 0.020237445831298828

But every 4th or 5th time that I run this, I get this:

: xr 0.8518760204315186 : pd 0.02686452865600586

This is repeatable. I've Run this code 100s of times now, and every 4th or 5th run it takes 10x. Nothing else is going on on my computer.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Resample is ~100x slower than Pandas resample; Speed is related to resample period (unlike Pandas) 718436141

Advanced export

JSON shape: default, array, newline-delimited, object

CSV options:

CREATE TABLE [issue_comments] (
   [html_url] TEXT,
   [issue_url] TEXT,
   [id] INTEGER PRIMARY KEY,
   [node_id] TEXT,
   [user] INTEGER REFERENCES [users]([id]),
   [created_at] TEXT,
   [updated_at] TEXT,
   [author_association] TEXT,
   [body] TEXT,
   [reactions] TEXT,
   [performed_via_github_app] TEXT,
   [issue] INTEGER REFERENCES [issues]([id])
);
CREATE INDEX [idx_issue_comments_issue]
    ON [issue_comments] ([issue]);
CREATE INDEX [idx_issue_comments_user]
    ON [issue_comments] ([user]);
Powered by Datasette · Queries took 11.215ms · About: xarray-datasette