home / github / pull_requests

Menu
  • GraphQL API
  • Search all tables

pull_requests: 514633891

This data as json

id node_id number state locked title user body created_at updated_at closed_at merged_at merge_commit_sha assignee milestone draft head base author_association auto_merge repo url merged_by
514633891 MDExOlB1bGxSZXF1ZXN0NTE0NjMzODkx 4560 closed 0 Optimize slice_slice for faster isel of huge datasets 11994217 I noticed that reading small slices of huge datasets (>1e8 rows) was very slow, even if they were properly chunked. I traced the issue back to `xarray.core.indexing.slice_slice`, which essentially calls `np.arange(ds_size)` to compute a slice. This is obviously `O(ds_size)`, even if the actual slice to be read is tiny. You can see the issue in this gist: https://gist.github.com/dionhaefner/a3e97bae0a4e28f0d39294074419a683 I took the liberty to optimize the function by computing the resulting slice arithmetically. With this in place, reading from disk is now the bottleneck as it should be. I saw performance increases by about a factor of 10, but this obviously varies with dimension size, slice size, and chunk size. --- <!-- Feel free to remove check-list items aren't relevant to your change --> - [x] Passes `isort . && black . && mypy . && flake8` 2020-11-03T10:26:38Z 2020-11-05T19:45:44Z 2020-11-05T19:07:24Z 2020-11-05T19:07:23Z 235b2e5bcec253ca6a85762323121d28c3b06038     0 86c56ca4b9e8d01136a7eed90160723e7535f0d2 83884a1c6dac4b5f6309dfea530414facc100bc8 CONTRIBUTOR   13221727 https://github.com/pydata/xarray/pull/4560  

Links from other tables

  • 0 rows from pull_requests_id in labels_pull_requests
Powered by Datasette · Queries took 101.955ms