home / github / issue_comments

Menu
  • Search all tables
  • GraphQL API

issue_comments: 469869298

This data as json

html_url issue_url id node_id user created_at updated_at author_association body reactions performed_via_github_app issue
https://github.com/pydata/xarray/issues/2799#issuecomment-469869298 https://api.github.com/repos/pydata/xarray/issues/2799 469869298 MDEyOklzc3VlQ29tbWVudDQ2OTg2OTI5OA== 1217238 2019-03-05T21:43:18Z 2019-03-05T21:43:32Z MEMBER

Cython + memoryviews isn't quite the right comparison here. I'm sure ordering here is correct, but relative magnitude of the performance difference should be smaller.

Xarray's core is bottlenecked on: 1. Overhead of abstraction with normal Python operations (e.g., function calls) in non-numeric code (all the heavy numerics is offloaded to NumPy or pandas). 2. The dynamic nature of our APIs, which means we need to do lots of type checking. Notice how high up builtins.isinstance appears in that performance profile!

C++ offers very low-cost abstraction but dynamism is still slow. Even then, compilers are much better at speeding up tight numeric loops than complex domain logic.

As a point of reference, it would be interesting to see these performance numbers running pypy, which I think should be able to handle everything in xarray. You'll note that pypy is something like 7x faster than CPython in their benchmark suite, which I suspect is closer to what we'd see if we wrote xarray's core in a language like C++, e.g., as Python interface to xframe.

{
    "total_count": 1,
    "+1": 1,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  416962458
Powered by Datasette · Queries took 0.745ms · About: xarray-datasette