home / github

Menu
  • Search all tables
  • GraphQL API

issue_comments

Table actions
  • GraphQL API for issue_comments

8 rows where issue = 1295939038 sorted by updated_at descending

✎ View and edit SQL

This data as json, CSV (advanced)

Suggested facets: reactions, created_at (date), updated_at (date)

user 3

  • vnoel 4
  • dcherian 3
  • kmuehlbauer 1

author_association 2

  • CONTRIBUTOR 4
  • MEMBER 4

issue 1

  • simple groupby_bins 10x slower than numpy · 8 ✖
id html_url issue_url node_id user created_at updated_at ▲ author_association body reactions performed_via_github_app issue
1177245770 https://github.com/pydata/xarray/issues/6758#issuecomment-1177245770 https://api.github.com/repos/pydata/xarray/issues/6758 IC_kwDOAMm_X85GK1hK vnoel 731499 2022-07-07T08:26:26Z 2022-07-07T08:26:26Z CONTRIBUTOR

@dcherian Just to be complete, I thought the following one-liner would work as well:

sums, x = np.histogram(latitude, bins, weights=array)

but apparently it produces slightly different results for reasons I don't understand

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  simple groupby_bins 10x slower than numpy 1295939038
1177163992 https://github.com/pydata/xarray/issues/6758#issuecomment-1177163992 https://api.github.com/repos/pydata/xarray/issues/6758 IC_kwDOAMm_X85GKhjY vnoel 731499 2022-07-07T06:53:52Z 2022-07-07T06:53:52Z CONTRIBUTOR

the IDL histogram function but in numpy.

Apparently not as awesome!

Yeah, the present solution is less general, but most of the time I'm just counting stuff, and this is much faster than what I was doing, so I'm happy ;-)

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  simple groupby_bins 10x slower than numpy 1295939038
1177106457 https://github.com/pydata/xarray/issues/6758#issuecomment-1177106457 https://api.github.com/repos/pydata/xarray/issues/6758 IC_kwDOAMm_X85GKTgZ kmuehlbauer 5821660 2022-07-07T05:44:41Z 2022-07-07T05:44:41Z MEMBER

I'm getting a bit off topic now, but ...

Apparently not as awesome!

@dcherian Thanks for bringing back fond memories of the past. I still have @davidwfanning's IDL books on the shelf. And for sure it was a great pleasure reading @jdtsmith's IDL tricks and trying to understand those helped a lot. Great stuff.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  simple groupby_bins 10x slower than numpy 1295939038
1176918900 https://github.com/pydata/xarray/issues/6758#issuecomment-1176918900 https://api.github.com/repos/pydata/xarray/issues/6758 IC_kwDOAMm_X85GJlt0 dcherian 2448579 2022-07-07T01:03:17Z 2022-07-07T01:03:17Z MEMBER

the IDL histogram function but in numpy.

Apparently not as awesome!

{
    "total_count": 2,
    "+1": 2,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  simple groupby_bins 10x slower than numpy 1295939038
1176777842 https://github.com/pydata/xarray/issues/6758#issuecomment-1176777842 https://api.github.com/repos/pydata/xarray/issues/6758 IC_kwDOAMm_X85GJDRy vnoel 731499 2022-07-06T21:40:37Z 2022-07-06T21:40:37Z CONTRIBUTOR

@dcherian I just tested your numpy suggestions, and I'm getting 100x speedups compared to my naive numpy approach (~200µs vs ~20ms). Thankyouthankyouthankyou! I've been doing this for years, I can't believe I've never run into that particular solution. It's like the IDL histogram function but in numpy. I'm going to use this like crazy Thanks again

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  simple groupby_bins 10x slower than numpy 1295939038
1176730713 https://github.com/pydata/xarray/issues/6758#issuecomment-1176730713 https://api.github.com/repos/pydata/xarray/issues/6758 IC_kwDOAMm_X85GI3xZ dcherian 2448579 2022-07-06T20:54:23Z 2022-07-06T20:54:23Z MEMBER

Yes that's right.

For this simple problem you could combine np.digitize and np.bincount to do it much quicker.

python group_idx = np.digitize(latitude, bins) sums = np.bincount(group_idx, weights=array)

And then wrap this using apply_ufunc. See https://github.com/ml31415/numpy-groupies/blob/412be938dcdfd74c6d673dd29012d18dc25dc94f/numpy_groupies/aggregate_numpy.py#L8-L28 for inspiration.

{
    "total_count": 1,
    "+1": 1,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  simple groupby_bins 10x slower than numpy 1295939038
1176701867 https://github.com/pydata/xarray/issues/6758#issuecomment-1176701867 https://api.github.com/repos/pydata/xarray/issues/6758 IC_kwDOAMm_X85GIwur vnoel 731499 2022-07-06T20:37:12Z 2022-07-06T20:37:12Z CONTRIBUTOR

@dcherian this means that xarray's groupby_bins will always be slow unless flox is installed, correct? I have unfortunately little or no say on what packages are installed on the system that runs my code.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  simple groupby_bins 10x slower than numpy 1295939038
1176441916 https://github.com/pydata/xarray/issues/6758#issuecomment-1176441916 https://api.github.com/repos/pydata/xarray/issues/6758 IC_kwDOAMm_X85GHxQ8 dcherian 2448579 2022-07-06T16:40:47Z 2022-07-06T16:40:47Z MEMBER

On xarray main with flox installed:

``` python import numpy as np import xarray as xr display(xr.version)

N = 3728 ds = xr.Dataset() ds["latitude"] = ("x", 0 + 20 * np.random.standard_normal(N)) ds["data"] = ("x", 0 + 100 * np.random.standard_normal(N))

%timeit ds.groupby_bins("latitude", np.arange(-40, 40, 0.1)).sum() ``` 50.3 ms ± 203 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)

You could try it on our pre-release (https://docs.xarray.dev/en/latest/whats-new.html#v2022-06-0rc0-9-june-2022) or use xhistogram which should be faster even.

{
    "total_count": 1,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 1,
    "rocket": 0,
    "eyes": 0
}
  simple groupby_bins 10x slower than numpy 1295939038

Advanced export

JSON shape: default, array, newline-delimited, object

CSV options:

CREATE TABLE [issue_comments] (
   [html_url] TEXT,
   [issue_url] TEXT,
   [id] INTEGER PRIMARY KEY,
   [node_id] TEXT,
   [user] INTEGER REFERENCES [users]([id]),
   [created_at] TEXT,
   [updated_at] TEXT,
   [author_association] TEXT,
   [body] TEXT,
   [reactions] TEXT,
   [performed_via_github_app] TEXT,
   [issue] INTEGER REFERENCES [issues]([id])
);
CREATE INDEX [idx_issue_comments_issue]
    ON [issue_comments] ([issue]);
CREATE INDEX [idx_issue_comments_user]
    ON [issue_comments] ([user]);
Powered by Datasette · Queries took 807.195ms · About: xarray-datasette