issue_comments: 525965607
This data as json
html_url | issue_url | id | node_id | user | created_at | updated_at | author_association | body | reactions | performed_via_github_app | issue |
---|---|---|---|---|---|---|---|---|---|---|---|
https://github.com/pydata/xarray/pull/3258#issuecomment-525965607 | https://api.github.com/repos/pydata/xarray/issues/3258 | 525965607 | MDEyOklzc3VlQ29tbWVudDUyNTk2NTYwNw== | 2448579 | 2019-08-28T23:51:43Z | 2019-08-28T23:53:46Z | MEMBER | I started prototyping a Dataset version. Here's what I have: ``` python import dask import xarray as xr darray = xr.DataArray(np.ones((10, 20)), dims=['x', 'y'], coords={'x': np.arange(10), 'y': np.arange(100, 120)}) dset = darray.to_dataset(name='a') dset['b'] = dset.a + 50 dset['c'] = (dset.x + 20) dset = dset.chunk({'x': 4, 'y': 5}) ``` The function I'm applying takes a dataset and returns a DataArray because that's easy to test without figuring out how to assemble everything back into a dataset. ``` python import itertools function takes dataset and returns dataarray so that I can check that things work without reconstructing a datasetdef function(ds): return ds.a + 10 dataset_dims = list(dset.dims) graph = {} gname = 'dsnew' map dims to list of chunk indexesIf different variables have different chunking along the same dimthe call to .chunks will raise an error.ichunk = {dim: range(len(dset.chunks[dim])) for dim in dataset_dims} iterate over all possible chunk combinationsfor v in itertools.product(*ichunk.values()): chunk_index_dict = dict(zip(dataset_dims, v)) data_vars = {} for name, variable in dset.data_vars.items(): # why do does dask_keys have an extra level? # the [0] is not required for dataarrays var_dask_keys = variable.dask_keys()[0]
final_graph = dask.highlevelgraph.HighLevelGraph.from_collections(name, graph, dependencies=[dset]) ``` Elements of the graph look like
This doesn't work because dask doesn't replace the keys by numpy arrays when the
The graph is "disconnected":
I'm not sure what I'm doing wrong here. An equivalent version for DataArrays works perfectly. |
{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
484752930 |