issues: 735199603
This data as json
id | node_id | number | title | user | state | locked | assignee | milestone | comments | created_at | updated_at | closed_at | author_association | active_lock_reason | draft | pull_request | body | reactions | performed_via_github_app | state_reason | repo | type |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
735199603 | MDExOlB1bGxSZXF1ZXN0NTE0NjMzODkx | 4560 | Optimize slice_slice for faster isel of huge datasets | 11994217 | closed | 0 | 5 | 2020-11-03T10:26:38Z | 2020-11-05T19:45:44Z | 2020-11-05T19:07:24Z | CONTRIBUTOR | 0 | pydata/xarray/pulls/4560 | I noticed that reading small slices of huge datasets (>1e8 rows) was very slow, even if they were properly chunked. I traced the issue back to You can see the issue in this gist: https://gist.github.com/dionhaefner/a3e97bae0a4e28f0d39294074419a683 I took the liberty to optimize the function by computing the resulting slice arithmetically. With this in place, reading from disk is now the bottleneck as it should be. I saw performance increases by about a factor of 10, but this obviously varies with dimension size, slice size, and chunk size.
|
{ "url": "https://api.github.com/repos/pydata/xarray/issues/4560/reactions", "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
13221727 | pull |