html_url,issue_url,id,node_id,user,created_at,updated_at,author_association,body,reactions,performed_via_github_app,issue
https://github.com/pydata/xarray/issues/798#issuecomment-262214999,https://api.github.com/repos/pydata/xarray/issues/798,262214999,MDEyOklzc3VlQ29tbWVudDI2MjIxNDk5OQ==,346079,2016-11-22T11:18:56Z,2016-11-22T11:18:56Z,NONE,"When using xarray with the `dask.distributed` scheduler it would be useful to be able to persist intermediate `DataArray`s / `Dataset`s on remote workers.

There could be a `persist` method analogous to the `compute` method introduced in #1024. Potential issues with this approach are:

1. What are the semantics of this operation for the general case where dask or distributed are not used?
2. Is it justified to add an operation which is rather specific to the distributed scheduler?

(Could create a separate issue for this if preferred).","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,142498006
https://github.com/pydata/xarray/issues/798#issuecomment-259277067,https://api.github.com/repos/pydata/xarray/issues/798,259277067,MDEyOklzc3VlQ29tbWVudDI1OTI3NzA2Nw==,346079,2016-11-08T22:17:14Z,2016-11-08T22:17:14Z,NONE,"Great to see this moving! I take it the workshop was productive?

How does #1095 work in the scenario of a distributed scheduler with remote workers? Do I understand correctly that all workers and the client would need to see the same shared filesystem from where NetCDF files are read?
","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,142498006
https://github.com/pydata/xarray/issues/798#issuecomment-256038226,https://api.github.com/repos/pydata/xarray/issues/798,256038226,MDEyOklzc3VlQ29tbWVudDI1NjAzODIyNg==,346079,2016-10-25T13:43:32Z,2016-10-25T13:43:32Z,NONE,"For the case where NetCDF / HDF5 files are only available on the distributed workers and not directly accessible from the client, how would you get the necessary metadata (coords, dims etc.) to construct the `xarray.Dataset`?
","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,142498006
https://github.com/pydata/xarray/issues/798#issuecomment-255207705,https://api.github.com/repos/pydata/xarray/issues/798,255207705,MDEyOklzc3VlQ29tbWVudDI1NTIwNzcwNQ==,346079,2016-10-20T19:42:41Z,2016-10-20T19:42:41Z,NONE,"I'm probably not familiar enough with either the xarray or dask / distributed codebases to provide much input but would be happy to contribute if / where it makes sense. Would also be happy to be part of a some real-time discussion if feasible (based in the UK, so wouldn't be able to attend the workshop).
","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,142498006
https://github.com/pydata/xarray/issues/798#issuecomment-255184991,https://api.github.com/repos/pydata/xarray/issues/798,255184991,MDEyOklzc3VlQ29tbWVudDI1NTE4NDk5MQ==,346079,2016-10-20T18:14:38Z,2016-10-20T18:14:38Z,NONE,"Has this issue progressed since?

Being able to distribute loading of files to a dask cluster and composing an xarray `Dataset` from data on remote workers would be a great feature.

Is @mrocklin's [blog post](http://matthewrocklin.com/blog/work/2016/02/26/dask-distributed-part-3) from Feb 2016 still the reference for remote data loading on a cluster? Adapting it to loading xarray Datasets rather than plain arrays is not straightforward since there is no way to combine futures representing Datasets out of the box.
","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,142498006