home / github / issues

Menu
  • GraphQL API
  • Search all tables

issues: 662982199

This data as json

id node_id number title user state locked assignee milestone comments created_at updated_at closed_at author_association active_lock_reason draft pull_request body reactions performed_via_github_app state_reason repo type
662982199 MDU6SXNzdWU2NjI5ODIxOTk= 4241 Parallel tasks on subsets of a dask array wrapped in an xarray Dataset 41797673 closed 0     5 2020-07-21T12:47:41Z 2020-07-27T08:18:13Z 2020-07-27T08:18:13Z NONE      

I have a large xarray.Dataset stored as a zarr. I want to perform some custom operations on it that cannot be done by just using numpy-like functions that a Dask cluster will automatically deal with. Therefore, I partition the dataset into small subsets and for each subset submit to my Dask cluster a task of the form def my_task(zarr_path, subset_index): ds = xarray.open_zarr(zarr_path) # this returns an xarray.Dataset containing a dask.array sel = ds.sel(partition_index) sel = sel.load() # I want to get the data into memory # then do my custom operations ... However, I have noticed this creates a "task within a task": when a worker receives "my_task", it in turn submits tasks to the cluster to load the relevant part of the dataset. To avoid this and ensure that the full task is executed within the worker, I am submitting instead the task: def my_task_2(zarr_path, subset_index): with dask.config.set(scheduler="threading"): my_task(zarr_path, subset_index) Is this the best way to do this? What's the best practice for this kind of situation?

I have already posted this on stackoverflow but did not get any answer, so I am adding this here hoping it increases visibility. Apologies if this is considered "pollution". https://stackoverflow.com/questions/62874267/parallel-tasks-on-subsets-of-a-dask-array-wrapped-in-an-xarray-dataset

{
    "url": "https://api.github.com/repos/pydata/xarray/issues/4241/reactions",
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  completed 13221727 issue

Links from other tables

  • 0 rows from issues_id in issues_labels
  • 5 rows from issue in issue_comments
Powered by Datasette · Queries took 0.5ms · About: xarray-datasette