home / github

Menu
  • GraphQL API
  • Search all tables

issue_comments

Table actions
  • GraphQL API for issue_comments

3 rows where issue = 1295939038 and user = 2448579 sorted by updated_at descending

✎ View and edit SQL

This data as json, CSV (advanced)

Suggested facets: created_at (date), updated_at (date)

user 1

  • dcherian · 3 ✖

issue 1

  • simple groupby_bins 10x slower than numpy · 3 ✖

author_association 1

  • MEMBER 3
id html_url issue_url node_id user created_at updated_at ▲ author_association body reactions performed_via_github_app issue
1176918900 https://github.com/pydata/xarray/issues/6758#issuecomment-1176918900 https://api.github.com/repos/pydata/xarray/issues/6758 IC_kwDOAMm_X85GJlt0 dcherian 2448579 2022-07-07T01:03:17Z 2022-07-07T01:03:17Z MEMBER

the IDL histogram function but in numpy.

Apparently not as awesome!

{
    "total_count": 2,
    "+1": 2,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  simple groupby_bins 10x slower than numpy 1295939038
1176730713 https://github.com/pydata/xarray/issues/6758#issuecomment-1176730713 https://api.github.com/repos/pydata/xarray/issues/6758 IC_kwDOAMm_X85GI3xZ dcherian 2448579 2022-07-06T20:54:23Z 2022-07-06T20:54:23Z MEMBER

Yes that's right.

For this simple problem you could combine np.digitize and np.bincount to do it much quicker.

python group_idx = np.digitize(latitude, bins) sums = np.bincount(group_idx, weights=array)

And then wrap this using apply_ufunc. See https://github.com/ml31415/numpy-groupies/blob/412be938dcdfd74c6d673dd29012d18dc25dc94f/numpy_groupies/aggregate_numpy.py#L8-L28 for inspiration.

{
    "total_count": 1,
    "+1": 1,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  simple groupby_bins 10x slower than numpy 1295939038
1176441916 https://github.com/pydata/xarray/issues/6758#issuecomment-1176441916 https://api.github.com/repos/pydata/xarray/issues/6758 IC_kwDOAMm_X85GHxQ8 dcherian 2448579 2022-07-06T16:40:47Z 2022-07-06T16:40:47Z MEMBER

On xarray main with flox installed:

``` python import numpy as np import xarray as xr display(xr.version)

N = 3728 ds = xr.Dataset() ds["latitude"] = ("x", 0 + 20 * np.random.standard_normal(N)) ds["data"] = ("x", 0 + 100 * np.random.standard_normal(N))

%timeit ds.groupby_bins("latitude", np.arange(-40, 40, 0.1)).sum() ``` 50.3 ms ± 203 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)

You could try it on our pre-release (https://docs.xarray.dev/en/latest/whats-new.html#v2022-06-0rc0-9-june-2022) or use xhistogram which should be faster even.

{
    "total_count": 1,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 1,
    "rocket": 0,
    "eyes": 0
}
  simple groupby_bins 10x slower than numpy 1295939038

Advanced export

JSON shape: default, array, newline-delimited, object

CSV options:

CREATE TABLE [issue_comments] (
   [html_url] TEXT,
   [issue_url] TEXT,
   [id] INTEGER PRIMARY KEY,
   [node_id] TEXT,
   [user] INTEGER REFERENCES [users]([id]),
   [created_at] TEXT,
   [updated_at] TEXT,
   [author_association] TEXT,
   [body] TEXT,
   [reactions] TEXT,
   [performed_via_github_app] TEXT,
   [issue] INTEGER REFERENCES [issues]([id])
);
CREATE INDEX [idx_issue_comments_issue]
    ON [issue_comments] ([issue]);
CREATE INDEX [idx_issue_comments_user]
    ON [issue_comments] ([user]);
Powered by Datasette · Queries took 1130.169ms · About: xarray-datasette