home / github

Menu
  • GraphQL API
  • Search all tables

issue_comments

Table actions
  • GraphQL API for issue_comments

16 rows where issue = 236347050 sorted by updated_at descending

✎ View and edit SQL

This data as json, CSV (advanced)

Suggested facets: reactions, created_at (date), updated_at (date)

user 7

  • jhamman 4
  • shoyer 3
  • TomAugspurger 3
  • rabernat 2
  • Zac-HD 2
  • wesm 1
  • max-sixty 1

author_association 2

  • MEMBER 14
  • CONTRIBUTOR 2

issue 1

  • Feature/benchmark · 16 ✖
id html_url issue_url node_id user created_at updated_at ▲ author_association body reactions performed_via_github_app issue
318468605 https://github.com/pydata/xarray/pull/1457#issuecomment-318468605 https://api.github.com/repos/pydata/xarray/issues/1457 MDEyOklzc3VlQ29tbWVudDMxODQ2ODYwNQ== jhamman 2443309 2017-07-27T19:54:01Z 2017-07-27T19:54:01Z MEMBER

Yes! Thanks @wesm and @TomAugspurger.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Feature/benchmark 236347050
318451800 https://github.com/pydata/xarray/pull/1457#issuecomment-318451800 https://api.github.com/repos/pydata/xarray/issues/1457 MDEyOklzc3VlQ29tbWVudDMxODQ1MTgwMA== TomAugspurger 1312546 2017-07-27T18:45:36Z 2017-07-27T18:45:36Z MEMBER

Yep, thanks again for setting that up.

On Thu, Jul 27, 2017 at 11:39 AM, Wes McKinney notifications@github.com wrote:

cool, are these numbers coming off the pandabox?

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/pydata/xarray/pull/1457#issuecomment-318417790, or mute the thread https://github.com/notifications/unsubscribe-auth/ABQHIk24wmNhChH3nCVT3AGqR_Q6EHa9ks5sSL1IgaJpZM4N74gy .

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Feature/benchmark 236347050
318417790 https://github.com/pydata/xarray/pull/1457#issuecomment-318417790 https://api.github.com/repos/pydata/xarray/issues/1457 MDEyOklzc3VlQ29tbWVudDMxODQxNzc5MA== wesm 329591 2017-07-27T16:39:34Z 2017-07-27T16:39:34Z MEMBER

cool, are these numbers coming off the pandabox?

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Feature/benchmark 236347050
318415555 https://github.com/pydata/xarray/pull/1457#issuecomment-318415555 https://api.github.com/repos/pydata/xarray/issues/1457 MDEyOklzc3VlQ29tbWVudDMxODQxNTU1NQ== shoyer 1217238 2017-07-27T16:31:14Z 2017-07-27T16:31:14Z MEMBER

Awesome, thanks @TomAugspurger !

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Feature/benchmark 236347050
318376827 https://github.com/pydata/xarray/pull/1457#issuecomment-318376827 https://api.github.com/repos/pydata/xarray/issues/1457 MDEyOklzc3VlQ29tbWVudDMxODM3NjgyNw== TomAugspurger 1312546 2017-07-27T14:21:30Z 2017-07-27T14:21:30Z MEMBER

These are now being run and published to https://tomaugspurger.github.io/asv-collection/xarray/

I'm plan to find a more permanent home to publish the results rather than my personal github pages site, but that may take a while before I can get to it.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Feature/benchmark 236347050
317758630 https://github.com/pydata/xarray/pull/1457#issuecomment-317758630 https://api.github.com/repos/pydata/xarray/issues/1457 MDEyOklzc3VlQ29tbWVudDMxNzc1ODYzMA== rabernat 1197350 2017-07-25T14:38:36Z 2017-07-25T14:38:36Z MEMBER

I will merge by the end of the day if no one has any more comments.

{
    "total_count": 1,
    "+1": 1,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Feature/benchmark 236347050
317091662 https://github.com/pydata/xarray/pull/1457#issuecomment-317091662 https://api.github.com/repos/pydata/xarray/issues/1457 MDEyOklzc3VlQ29tbWVudDMxNzA5MTY2Mg== jhamman 2443309 2017-07-21T19:27:49Z 2017-07-21T19:27:49Z MEMBER

Thanks @TomAugspurger - see https://github.com/TomAugspurger/asv-runner/issues/1.

All, I added a series of multi-file benchmarks. I think for a first PR, this is ready to fly and we can add more benchmarks as needed.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Feature/benchmark 236347050
315402471 https://github.com/pydata/xarray/pull/1457#issuecomment-315402471 https://api.github.com/repos/pydata/xarray/issues/1457 MDEyOklzc3VlQ29tbWVudDMxNTQwMjQ3MQ== TomAugspurger 1312546 2017-07-14T16:21:29Z 2017-07-14T16:21:29Z MEMBER

About hardware, we should be able to run these on the machine running the pandas benchmarks. Once it's merged I should be able to add it easily to https://github.com/TomAugspurger/asv-runner/blob/master/tests/full.yml and the benchmarks will be run and published (to https://tomaugspurger.github.io/asv-collection/ right now; not the permanent home)

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Feature/benchmark 236347050
315401470 https://github.com/pydata/xarray/pull/1457#issuecomment-315401470 https://api.github.com/repos/pydata/xarray/issues/1457 MDEyOklzc3VlQ29tbWVudDMxNTQwMTQ3MA== rabernat 1197350 2017-07-14T16:17:07Z 2017-07-14T16:17:07Z MEMBER

I think this a great start!

I would really like to see a performance test for open_mfdataset, since this is my own personal bottleneck.

Regarding the dependence on hardware, I/O speeds, etc, we should be able to resolve this by running on specific instance types on a cloud platform. We could configure environments with local SSD storage, network storage, etc, in order to cover different scenarios.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Feature/benchmark 236347050
315273074 https://github.com/pydata/xarray/pull/1457#issuecomment-315273074 https://api.github.com/repos/pydata/xarray/issues/1457 MDEyOklzc3VlQ29tbWVudDMxNTI3MzA3NA== shoyer 1217238 2017-07-14T05:24:04Z 2017-07-14T05:24:04Z MEMBER

We should do this to the extent that it is helpful in driving development. Even just a few realistic use cases can be helpful, especially for guarding against performance regressions. On Thu, Jul 13, 2017 at 3:37 PM Joe Hamman notifications@github.com wrote:

@rabernat https://github.com/rabernat - do you have any thoughts on this?

@pydata/xarray https://github.com/orgs/pydata/teams/xarray - I'm trying to decide if this is worth spending any more time on. What sort of coverage would we want before we merge this first PR?

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/pydata/xarray/pull/1457#issuecomment-315220704, or mute the thread https://github.com/notifications/unsubscribe-auth/ABKS1rVPL6fCRqwi1Szmtq09chkah9odks5sNpwQgaJpZM4N74gy .

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Feature/benchmark 236347050
315220704 https://github.com/pydata/xarray/pull/1457#issuecomment-315220704 https://api.github.com/repos/pydata/xarray/issues/1457 MDEyOklzc3VlQ29tbWVudDMxNTIyMDcwNA== jhamman 2443309 2017-07-13T22:37:02Z 2017-07-13T22:37:02Z MEMBER

@rabernat - do you have any thoughts on this?

@pydata/xarray - I'm trying to decide if this is worth spending any more time on. What sort of coverage would we want before we merge this first PR?

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Feature/benchmark 236347050
308935684 https://github.com/pydata/xarray/pull/1457#issuecomment-308935684 https://api.github.com/repos/pydata/xarray/issues/1457 MDEyOklzc3VlQ29tbWVudDMwODkzNTY4NA== jhamman 2443309 2017-06-16T05:20:24Z 2017-06-16T05:20:24Z MEMBER

Keep the comments coming! I think we can distinguish between benchmarking for regressions and benchmarking for development and introspection.

The former will require some thought as to what machines we want to rely on and how to achieve consistency throughout the development track. It sounds like there are a number of options that we could pursue toward those ends.

The latter use of benchmarking is useful on a single machine with only a few commits of history. For the four benchmarks in my sample dataset_io.py, we get the following interesting results (for one environment): --[ 0.00%] Benchmarking conda-py2.7-bottleneck-dask-netcdf4-numpy-pandas-scipy ---[ 3.12%] Running dataset_io.IOSingleNetCDF.time_load_dataset_netcdf4 134.34ms ---[ 6.25%] Running dataset_io.IOSingleNetCDF.time_load_dataset_scipy 82.60ms ---[ 9.38%] Running dataset_io.IOSingleNetCDF.time_write_dataset_netcdf4 57.71ms ---[ 12.50%] Running dataset_io.IOSingleNetCDF.time_write_dataset_scipy 267.29ms

So the relative performance is useful information in deciding how to use and/or develop xarray. (Granted the exact factors will change depending on machine/architecture/dataset).

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Feature/benchmark 236347050
308932098 https://github.com/pydata/xarray/pull/1457#issuecomment-308932098 https://api.github.com/repos/pydata/xarray/issues/1457 MDEyOklzc3VlQ29tbWVudDMwODkzMjA5OA== max-sixty 5635139 2017-06-16T04:45:49Z 2017-06-16T04:51:14Z MEMBER

This is a great start! Thanks @jhamman !

Our most common performance problems are handling pandas 'oddities', like non-standard indexes. Generally when an operation that is generally vectorized becomes un-vectorized, and starts looping in python. But that's probably not a big use case for most.

What are the instances others have seen performance issues? Are there ever issues with the standard transform operations, such as merge?

(addendum, I just saw the comments above): I think there's some real benefit in benchmarks to ensure we don't add code that slow down operations by an order of magnitude slower - i.e. outside the bounds of reasonable error. That's broader than optimizing around them, particularly since xarray is all python, and shouldn't be doing performance intensive work internally.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Feature/benchmark 236347050
308926818 https://github.com/pydata/xarray/pull/1457#issuecomment-308926818 https://api.github.com/repos/pydata/xarray/issues/1457 MDEyOklzc3VlQ29tbWVudDMwODkyNjgxOA== Zac-HD 12229877 2017-06-16T03:57:57Z 2017-06-16T03:57:57Z CONTRIBUTOR

The tests for Hypothesis take almost twice as long to run on Travis at certain times of day, so I certainly wouldn't use it for benchmarking anything!

Also concerned that a dedicated benchmarking machine may lead to software (accidentally!) optimized for a particular architecture or balance of machine resources without due consideration. Maybe @wesm could investigate fault injection to (eg) slow down disk access or add latency for some sets of benchmarks?

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Feature/benchmark 236347050
308925978 https://github.com/pydata/xarray/pull/1457#issuecomment-308925978 https://api.github.com/repos/pydata/xarray/issues/1457 MDEyOklzc3VlQ29tbWVudDMwODkyNTk3OA== shoyer 1217238 2017-06-16T03:50:33Z 2017-06-16T03:50:33Z MEMBER

@wesm just setup a machine for dedicated benchmarking of pandas and possibly other pydata/scipy project (if there's extra capacity as expected). @TomAugspurger has been working on getting it setup. So that's potentially an option, at least for single machine benchmarks.

The lore I've heard is that benchmarking on shared cloud resources (e.g., Travis-CI) can have reproducibility issues due to resource contention and/or jobs getting scheduled on slightly different machine types. I don't know how true this still is, or if there are good work arounds for particular cloud platforms. I suspect this should be solvable, though. I can certainly make an internal inquiry about benchmarking on GCP if we can't find answers on our own.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Feature/benchmark 236347050
308923548 https://github.com/pydata/xarray/pull/1457#issuecomment-308923548 https://api.github.com/repos/pydata/xarray/issues/1457 MDEyOklzc3VlQ29tbWVudDMwODkyMzU0OA== Zac-HD 12229877 2017-06-16T03:29:12Z 2017-06-16T03:29:12Z CONTRIBUTOR

I like the idea of benchmarks, but have some serious concerns. For Dask and IO-bound work in general, benchmark results will vary widely depending on the hardware and (if relevant) network properties. Results will be noncomparable between SSD and HDD, local and remote network access, and in general depend heavily on the specific IO patterns and storage/compute relationship of the computer.

This isn't a reason not to benchmark though, just a call for very cautious interpretation - it's clearly useful to catch some of the subtle-but-pathological performance problems that have cropped up. In short, I think benchmarks should have a very clear warnings section in the documentation, and no decision should be taken to change code without benchmarking on a variety of computers (SSD/HDD, PC/cluster, local/remote data...).

Also JSON cannot include comments, and there are a number of entries that you need to update, but that's a passing concern.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Feature/benchmark 236347050

Advanced export

JSON shape: default, array, newline-delimited, object

CSV options:

CREATE TABLE [issue_comments] (
   [html_url] TEXT,
   [issue_url] TEXT,
   [id] INTEGER PRIMARY KEY,
   [node_id] TEXT,
   [user] INTEGER REFERENCES [users]([id]),
   [created_at] TEXT,
   [updated_at] TEXT,
   [author_association] TEXT,
   [body] TEXT,
   [reactions] TEXT,
   [performed_via_github_app] TEXT,
   [issue] INTEGER REFERENCES [issues]([id])
);
CREATE INDEX [idx_issue_comments_issue]
    ON [issue_comments] ([issue]);
CREATE INDEX [idx_issue_comments_user]
    ON [issue_comments] ([user]);
Powered by Datasette · Queries took 12.705ms · About: xarray-datasette