home / github / issue_comments

Menu
  • Search all tables
  • GraphQL API

issue_comments: 469439957

This data as json

html_url issue_url id node_id user created_at updated_at author_association body reactions performed_via_github_app issue
https://github.com/pydata/xarray/issues/2799#issuecomment-469439957 https://api.github.com/repos/pydata/xarray/issues/2799 469439957 MDEyOklzc3VlQ29tbWVudDQ2OTQzOTk1Nw== 1217238 2019-03-04T22:03:37Z 2019-03-04T22:16:49Z MEMBER

While python will always be slower than C when iterating over an array in this fashion, I would hope that xarray could be nearly as fast as numpy. I am not sure what the best way to improve this is though.

I'm sure it's possible to optimize this significantly, but short of rewriting this logic in a lower level language it's pretty much impossible to match the speed of NumPy.

This benchmark might give some useful context: ``` def dummy_isel(args, *kwargs): pass

def index_dummy(named_indices, arr): for named_index in named_indices: dummy_isel(arr, **named_index) %%timeit -n 10 index_dummy(named_indices, arr) ```

On my machine, this is already twice as slow as your NumPy benchmark (497 µs vs 251 µs) , and all it's doing is parsing *args and **kwargs! Every Python function/method call involving keyword arguments adds about 0.5 ns of overhead, because the highly optimized dict is (relatively) slow compared to positional arguments. In my experience it is almost impossible to get the overhead of a Python function call below a few microseconds.

Right now we're at about 130 µs per indexing operation. In the best case, we might make this 10x faster but even that would be quite challenging, e.g., consider that even creating a DataArray takes about 20 µs.

{
    "total_count": 1,
    "+1": 1,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  416962458
Powered by Datasette · Queries took 0.956ms · About: xarray-datasette