pull_requests: 514633891
This data as json
id | node_id | number | state | locked | title | user | body | created_at | updated_at | closed_at | merged_at | merge_commit_sha | assignee | milestone | draft | head | base | author_association | auto_merge | repo | url | merged_by |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
514633891 | MDExOlB1bGxSZXF1ZXN0NTE0NjMzODkx | 4560 | closed | 0 | Optimize slice_slice for faster isel of huge datasets | 11994217 | I noticed that reading small slices of huge datasets (>1e8 rows) was very slow, even if they were properly chunked. I traced the issue back to `xarray.core.indexing.slice_slice`, which essentially calls `np.arange(ds_size)` to compute a slice. This is obviously `O(ds_size)`, even if the actual slice to be read is tiny. You can see the issue in this gist: https://gist.github.com/dionhaefner/a3e97bae0a4e28f0d39294074419a683 I took the liberty to optimize the function by computing the resulting slice arithmetically. With this in place, reading from disk is now the bottleneck as it should be. I saw performance increases by about a factor of 10, but this obviously varies with dimension size, slice size, and chunk size. --- <!-- Feel free to remove check-list items aren't relevant to your change --> - [x] Passes `isort . && black . && mypy . && flake8` | 2020-11-03T10:26:38Z | 2020-11-05T19:45:44Z | 2020-11-05T19:07:24Z | 2020-11-05T19:07:23Z | 235b2e5bcec253ca6a85762323121d28c3b06038 | 0 | 86c56ca4b9e8d01136a7eed90160723e7535f0d2 | 83884a1c6dac4b5f6309dfea530414facc100bc8 | CONTRIBUTOR | 13221727 | https://github.com/pydata/xarray/pull/4560 |
Links from other tables
- 0 rows from pull_requests_id in labels_pull_requests