home / github

Menu
  • GraphQL API
  • Search all tables

issue_comments

Table actions
  • GraphQL API for issue_comments

3 rows where author_association = "MEMBER" and issue = 718436141 sorted by updated_at descending

✎ View and edit SQL

This data as json, CSV (advanced)

Suggested facets: reactions, created_at (date), updated_at (date)

user 2

  • shoyer 2
  • dcherian 1

issue 1

  • Resample is ~100x slower than Pandas resample; Speed is related to resample period (unlike Pandas) · 3 ✖

author_association 1

  • MEMBER · 3 ✖
id html_url issue_url node_id user created_at updated_at ▲ author_association body reactions performed_via_github_app issue
706640332 https://github.com/pydata/xarray/issues/4498#issuecomment-706640332 https://api.github.com/repos/pydata/xarray/issues/4498 MDEyOklzc3VlQ29tbWVudDcwNjY0MDMzMg== shoyer 1217238 2020-10-11T02:34:47Z 2020-10-11T02:34:47Z MEMBER

I might add that this is somewhat I've wanted to speed-up in xarray since the very early days. But until I noticed the numpy-groupies package, it seemed like a pretty challenging task.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Resample is ~100x slower than Pandas resample; Speed is related to resample period (unlike Pandas) 718436141
706640151 https://github.com/pydata/xarray/issues/4498#issuecomment-706640151 https://api.github.com/repos/pydata/xarray/issues/4498 MDEyOklzc3VlQ29tbWVudDcwNjY0MDE1MQ== shoyer 1217238 2020-10-11T02:32:25Z 2020-10-11T02:32:36Z MEMBER

resample uses the same machinery in xarray as other grouped aggregations.

Right now, grouped aggregations are very slow when there are many groups (like in resample) because we use a Python loop over groups.

Probably the most obvious way to speed this up would be to wrap the "numpy-groupies" package in xarray: https://github.com/pydata/xarray/issues/4473

{
    "total_count": 1,
    "+1": 1,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Resample is ~100x slower than Pandas resample; Speed is related to resample period (unlike Pandas) 718436141
706435642 https://github.com/pydata/xarray/issues/4498#issuecomment-706435642 https://api.github.com/repos/pydata/xarray/issues/4498 MDEyOklzc3VlQ29tbWVudDcwNjQzNTY0Mg== dcherian 2448579 2020-10-09T22:57:48Z 2020-10-09T22:57:55Z MEMBER

@mankoff (hi!) This is interesting. If I comment out .mean() I get ``` 1H xr 0.003030538558959961 1H pd 0.0014064311981201172

1D xr 0.0026717185974121094 1D pd 0.0013244152069091797 ```

i.e. we are 2x slower just on factorizing.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Resample is ~100x slower than Pandas resample; Speed is related to resample period (unlike Pandas) 718436141

Advanced export

JSON shape: default, array, newline-delimited, object

CSV options:

CREATE TABLE [issue_comments] (
   [html_url] TEXT,
   [issue_url] TEXT,
   [id] INTEGER PRIMARY KEY,
   [node_id] TEXT,
   [user] INTEGER REFERENCES [users]([id]),
   [created_at] TEXT,
   [updated_at] TEXT,
   [author_association] TEXT,
   [body] TEXT,
   [reactions] TEXT,
   [performed_via_github_app] TEXT,
   [issue] INTEGER REFERENCES [issues]([id])
);
CREATE INDEX [idx_issue_comments_issue]
    ON [issue_comments] ([issue]);
CREATE INDEX [idx_issue_comments_user]
    ON [issue_comments] ([user]);
Powered by Datasette · Queries took 13.663ms · About: xarray-datasette