id,node_id,number,title,user,state,locked,assignee,milestone,comments,created_at,updated_at,closed_at,author_association,active_lock_reason,draft,pull_request,body,reactions,performed_via_github_app,state_reason,repo,type 1308715638,I_kwDOAMm_X85OAWp2,6807,Alternative parallel execution frameworks in xarray,35968931,closed,0,,,12,2022-07-18T21:48:10Z,2023-05-18T17:34:33Z,2023-05-18T17:34:33Z,MEMBER,,,,"### Is your feature request related to a problem? Since early on the project xarray has supported wrapping `dask.array` objects in a first-class manner. However recent work on flexible array wrapping has made it possible to wrap all sorts of array types (and with #6804 we should support wrapping any array that conforms to the [array API standard](https://data-apis.org/array-api/latest/index.html)). Currently though the only way to parallelize array operations with xarray ""automatically"" is to use dask. (You could use [xarray-beam](https://github.com/google/xarray-beam) or other options too but they don't ""automatically"" generate the computation for you like dask does.) When dask is the only type of parallel framework exposing an array-like API then there is no need for flexibility, but now we have nascent projects like [cubed](https://github.com/tomwhite/cubed) to consider too. @tomwhite ### Describe the solution you'd like Refactor the internals so that dask is one option among many, and that any newer options can plug in in an extensible way. In particular cubed deliberately uses the same API as `dask.array`, exposing: 1) the methods needed to conform to the array API standard 2) a `.chunk` and `.compute` method, which we could dispatch to 3) dask-like functions to create computation graphs including [`blockwise`](https://github.com/tomwhite/cubed/blob/400dc9adcf21c8b468fce9f24e8d4b8cb9ef2f11/cubed/core/ops.py#L43), [`map_blocks`](https://github.com/tomwhite/cubed/blob/400dc9adcf21c8b468fce9f24e8d4b8cb9ef2f11/cubed/core/ops.py#L221), and [`rechunk`](https://github.com/tomwhite/cubed/blob/main/cubed/primitive/rechunk.py) I would like to see xarray able to wrap any array-like object which offers this set of methods / functions, and call the corresponding version of that method for the correct library (i.e. dask vs cubed) automatically. That way users could try different parallel execution frameworks simply via a switch like ```python ds.chunk(**chunk_pattern, manager=""dask"") ``` and see which one works best for their particular problem. ### Describe alternatives you've considered If we leave it the way it is now then xarray will not be truly flexible in this respect. Any library can wrap (or subclass if they are really brave) xarray objects to provide parallelism but that's not the same level of flexibility. ### Additional context [cubed repo](https://github.com/tomwhite/cubed) [PR](https://github.com/pydata/xarray/pull/6804) about making xarray able to wrap objects conforming to the new [array API standard](https://data-apis.org/array-api/latest/index.html) cc @shoyer @rabernat @dcherian @keewis ","{""url"": ""https://api.github.com/repos/pydata/xarray/issues/6807/reactions"", ""total_count"": 6, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 3, ""rocket"": 2, ""eyes"": 1}",,completed,13221727,issue