home / github / issues

Menu
  • Search all tables
  • GraphQL API

issues: 188517316

This data as json

id node_id number title user state locked assignee milestone comments created_at updated_at closed_at author_association active_lock_reason draft pull_request body reactions performed_via_github_app state_reason repo type
188517316 MDU6SXNzdWUxODg1MTczMTY= 1103 add dask optimization tips to docs 1197350 closed 0     0 2016-11-10T14:08:39Z 2016-11-10T16:49:06Z 2016-11-10T16:49:06Z MEMBER      

We should add the optimization tips that @shoyer describes in this mailing list thread to @karenamckinnon.

https://groups.google.com/forum/?utm_medium=email&utm_source=footer#!msg/xarray/11lDGSeza78/lR1uj9yWDAAJ

Specific things to try (we should add similar guidelines to xarray's docs):

  1. Do your spatial and temporal indexing with .sel() earlier in the pipeline, specifically before you resample. Resample triggers some computation on all the blocks, which in theory should commute with indexing, but we haven't implemented this optimization in dask yet: https://github.com/dask/dask/issues/746
  2. Save the temporal mean to disk as a netCDF file (and then load it again with open_dataset) before subtracting it. Again, in theory, dask should be able to do the computation in a streaming fashion, but in practice this is a fail case for the dask scheduler, because it tries to keep every chunk of an array that it computes in memory: https://github.com/dask/dask/issues/874
  3. Specify smaller chunks across space when using open_mfdataset, e.g., chunks={'latitude': 10, 'longitude': 10}. This makes spatial subsetting easier, because there's no risk you will load chunks of data referring to different chunks (probably not necessary if you do my suggestion 1).
{
    "url": "https://api.github.com/repos/pydata/xarray/issues/1103/reactions",
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  completed 13221727 issue

Links from other tables

  • 0 rows from issues_id in issues_labels
  • 0 rows from issue in issue_comments
Powered by Datasette · Queries took 403.143ms · About: xarray-datasette