issues: 142498006
This data as json
id | node_id | number | title | user | state | locked | assignee | milestone | comments | created_at | updated_at | closed_at | author_association | active_lock_reason | draft | pull_request | body | reactions | performed_via_github_app | state_reason | repo | type |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
142498006 | MDU6SXNzdWUxNDI0OTgwMDY= | 798 | Integration with dask/distributed (xarray backend design) | 4295853 | closed | 0 | 59 | 2016-03-21T23:18:02Z | 2019-01-13T04:12:32Z | 2019-01-13T04:12:32Z | CONTRIBUTOR | Dask (https://github.com/dask/dask) currently provides on-node parallelism for medium-size data problems. However, large climate data sets will require multiple-node parallelism to analyze large climate data sets because this constitutes a big data problem. A likely solution to this issue is integration of distributed (https://github.com/dask/distributed) with dask. Distributed is now integrated with dask and its benefits are already starting to be realized, e.g., see http://matthewrocklin.com/blog/work/2016/02/26/dask-distributed-part-3. Thus, this issue is designed to identify the steps needed to perform this integration, at a high-level. As stated by @shoyer, it will
Thus, we have the chance to make xarray big-data capable as well as provide improvements to the backend. To this end, I'm starting this issue to help begin the design process following the xarray mailing list discussion some of us have been having (@shoyer, @mrocklin, @rabernat). Task To Do List:
- [x] Verify asynchronous access error for |
{ "url": "https://api.github.com/repos/pydata/xarray/issues/798/reactions", "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
completed | 13221727 | issue |