home / github

Menu
  • Search all tables
  • GraphQL API

issue_comments

Table actions
  • GraphQL API for issue_comments

9 rows where issue = 206632333 sorted by updated_at descending

✎ View and edit SQL

This data as json, CSV (advanced)

Suggested facets: reactions, created_at (date), updated_at (date)

user 6

  • rabernat 2
  • shoyer 2
  • jhamman 2
  • mrocklin 1
  • pwolfram 1
  • max-sixty 1

author_association 2

  • MEMBER 8
  • CONTRIBUTOR 1

issue 1

  • PERF: Add benchmarking? · 9 ✖
id html_url issue_url node_id user created_at updated_at ▲ author_association body reactions performed_via_github_app issue
308083807 https://github.com/pydata/xarray/issues/1257#issuecomment-308083807 https://api.github.com/repos/pydata/xarray/issues/1257 MDEyOklzc3VlQ29tbWVudDMwODA4MzgwNw== mrocklin 306380 2017-06-13T11:12:26Z 2017-06-13T11:12:26Z MEMBER

@TomAugspurger has done some ASV work with Dask itself

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  PERF: Add benchmarking? 206632333
308002808 https://github.com/pydata/xarray/issues/1257#issuecomment-308002808 https://api.github.com/repos/pydata/xarray/issues/1257 MDEyOklzc3VlQ29tbWVudDMwODAwMjgwOA== jhamman 2443309 2017-06-13T04:21:48Z 2017-06-13T04:21:48Z MEMBER

@rabernat - great. I've setup a ASV project and am in the process of teaching myself how that all works. I'm just playing with some simple arithmatic benchmarks for now but, of course, most of our interested will be in the i/o and dask arenas.

I'm wondering if @mrocklin has seen ASV used with any dask projects. We'll just need to make sure we choose the appropriate timer when profiling dask functions.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  PERF: Add benchmarking? 206632333
307977450 https://github.com/pydata/xarray/issues/1257#issuecomment-307977450 https://api.github.com/repos/pydata/xarray/issues/1257 MDEyOklzc3VlQ29tbWVudDMwNzk3NzQ1MA== rabernat 1197350 2017-06-13T01:08:07Z 2017-06-13T01:08:07Z MEMBER

I am very interested. I have been doing a lot of benchmarking already wrt dask.distributed on my local cluster, focusing on performance with multi-terabyte datasets. At this scale, certain operations emerge as performance bottlenecks (e.g. index alignment of multi-file netcdf datasets, #1385).

I think this should probably be done in AWS or Google Cloud. That way we can establish a consistent test environment for benchmarking. I might be able to pay for that (especially if our proposal gets funded)!

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  PERF: Add benchmarking? 206632333
307934432 https://github.com/pydata/xarray/issues/1257#issuecomment-307934432 https://api.github.com/repos/pydata/xarray/issues/1257 MDEyOklzc3VlQ29tbWVudDMwNzkzNDQzMg== jhamman 2443309 2017-06-12T21:28:12Z 2017-06-12T21:28:12Z MEMBER

Is anyone interested in working on this with me over the next few months? Given the number of issues we've been seeing, I'd like to see this come together this summer. I think ASV is the natural starting point.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  PERF: Add benchmarking? 206632333
278987344 https://github.com/pydata/xarray/issues/1257#issuecomment-278987344 https://api.github.com/repos/pydata/xarray/issues/1257 MDEyOklzc3VlQ29tbWVudDI3ODk4NzM0NA== pwolfram 4295853 2017-02-10T16:15:22Z 2017-02-10T16:15:22Z CONTRIBUTOR

We would also benefit from this specifically for #1198 :+1:

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  PERF: Add benchmarking? 206632333
278845582 https://github.com/pydata/xarray/issues/1257#issuecomment-278845582 https://api.github.com/repos/pydata/xarray/issues/1257 MDEyOklzc3VlQ29tbWVudDI3ODg0NTU4Mg== rabernat 1197350 2017-02-10T03:04:31Z 2017-02-10T03:04:31Z MEMBER

Another 👍 for benchmarking. Especially as we start to get deep into integrating dask.distributed, having robust performance benchmarks will be very useful. One challenge is where to deploy the benchmarks. TravisCI might not be ideal, since performance can vary depending on competition from other virtual machines on the same system.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  PERF: Add benchmarking? 206632333
278836146 https://github.com/pydata/xarray/issues/1257#issuecomment-278836146 https://api.github.com/repos/pydata/xarray/issues/1257 MDEyOklzc3VlQ29tbWVudDI3ODgzNjE0Ng== shoyer 1217238 2017-02-10T01:58:03Z 2017-02-10T01:58:03Z MEMBER

One issue is that unit tests are often not good benchmarks. Ideal unit tests are as fast as possible, whereas ideal benchmarks should be run on more typical inputs, which may be much slower.

{
    "total_count": 1,
    "+1": 1,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  PERF: Add benchmarking? 206632333
278833457 https://github.com/pydata/xarray/issues/1257#issuecomment-278833457 https://api.github.com/repos/pydata/xarray/issues/1257 MDEyOklzc3VlQ29tbWVudDI3ODgzMzQ1Nw== max-sixty 5635139 2017-02-10T01:40:41Z 2017-02-10T01:40:41Z MEMBER

Yes ASV is good. I'm surprised there isn't something you can ask to just "robustly time these tests", so it can bolt on without writing new code. Although maybe the overlap between test code and benchmark code isn't as great as I imagine

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  PERF: Add benchmarking? 206632333
278788467 https://github.com/pydata/xarray/issues/1257#issuecomment-278788467 https://api.github.com/repos/pydata/xarray/issues/1257 MDEyOklzc3VlQ29tbWVudDI3ODc4ODQ2Nw== shoyer 1217238 2017-02-09T22:02:00Z 2017-02-09T22:02:00Z MEMBER

Yes, some sort of automated benchmarking could be valuable, especially for noticing and fixing regressions. I've done occasional benchmarks before to optimize bottlenecks (e.g., class constructors) but it's all been ad-hoc stuff with %timeit in IPython.

ASV seems like a pretty sane way to do this. pytest-benchmark can trigger test failures if performance goes below some set level but I suspect performance is too subjective and stochastic to be reliable.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  PERF: Add benchmarking? 206632333

Advanced export

JSON shape: default, array, newline-delimited, object

CSV options:

CREATE TABLE [issue_comments] (
   [html_url] TEXT,
   [issue_url] TEXT,
   [id] INTEGER PRIMARY KEY,
   [node_id] TEXT,
   [user] INTEGER REFERENCES [users]([id]),
   [created_at] TEXT,
   [updated_at] TEXT,
   [author_association] TEXT,
   [body] TEXT,
   [reactions] TEXT,
   [performed_via_github_app] TEXT,
   [issue] INTEGER REFERENCES [issues]([id])
);
CREATE INDEX [idx_issue_comments_issue]
    ON [issue_comments] ([issue]);
CREATE INDEX [idx_issue_comments_user]
    ON [issue_comments] ([user]);
Powered by Datasette · Queries took 1271.786ms · About: xarray-datasette