home / github / issues

Menu
  • Search all tables
  • GraphQL API

issues: 735199603

This data as json

id node_id number title user state locked assignee milestone comments created_at updated_at closed_at author_association active_lock_reason draft pull_request body reactions performed_via_github_app state_reason repo type
735199603 MDExOlB1bGxSZXF1ZXN0NTE0NjMzODkx 4560 Optimize slice_slice for faster isel of huge datasets 11994217 closed 0     5 2020-11-03T10:26:38Z 2020-11-05T19:45:44Z 2020-11-05T19:07:24Z CONTRIBUTOR   0 pydata/xarray/pulls/4560

I noticed that reading small slices of huge datasets (>1e8 rows) was very slow, even if they were properly chunked. I traced the issue back to xarray.core.indexing.slice_slice, which essentially calls np.arange(ds_size) to compute a slice. This is obviously O(ds_size), even if the actual slice to be read is tiny.

You can see the issue in this gist:

https://gist.github.com/dionhaefner/a3e97bae0a4e28f0d39294074419a683

I took the liberty to optimize the function by computing the resulting slice arithmetically. With this in place, reading from disk is now the bottleneck as it should be. I saw performance increases by about a factor of 10, but this obviously varies with dimension size, slice size, and chunk size.


  • [x] Passes isort . && black . && mypy . && flake8
{
    "url": "https://api.github.com/repos/pydata/xarray/issues/4560/reactions",
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
    13221727 pull

Links from other tables

  • 0 rows from issues_id in issues_labels
  • 5 rows from issue in issue_comments
Powered by Datasette · Queries took 1.161ms · About: xarray-datasette