issue_comments: 572311400
This data as json
html_url | issue_url | id | node_id | user | created_at | updated_at | author_association | body | reactions | performed_via_github_app | issue |
---|---|---|---|---|---|---|---|---|---|---|---|
https://github.com/pydata/xarray/issues/3668#issuecomment-572311400 | https://api.github.com/repos/pydata/xarray/issues/3668 | 572311400 | MDEyOklzc3VlQ29tbWVudDU3MjMxMTQwMA== | 3922329 | 2020-01-08T23:41:22Z | 2020-01-08T23:45:59Z | NONE | @rabernat Each Dask worker is running on its own machine. The data that I am trying to work with is distributed among workers, but all of it is accessible from any individual worker via cross-mounted NFS shares, so this works like a shared data storage, basically. None of that data is available on the client. For now, I'm trying to open just a single zarr store. I have only mentioned @dcherian You mean this code? ```python def modify(ds): # modify ds here return ds this is basically what open_mfdataset doesopen_kwargs = dict(decode_cf=True, decode_times=False) open_tasks = [dask.delayed(xr.open_dataset)(f, **open_kwargs) for f in file_names] tasks = [dask.delayed(modify)(task) for task in open_tasks] datasets = dask.compute(tasks) # get a list of xarray.Datasets combined = xr.combine_nested(datasets) # or some combination of concat, merge ``` In case of a single data source, I think, it can be condensed into this:
I get
on the client. Only if I wrap it in
So, this approach is not fully equivalent to what If I add Now, back to zarr:
so I don't even get a dataset object. Seems that something is quite different in the zarr backend implementation. I haven't had the chance to look at the code carefully yet, but I will do so in the next few days. Sorry for this long-winded explanation, I hope it clarifies what I'm trying to achieve here. |
{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
546562676 |