html_url,issue_url,id,node_id,user,created_at,updated_at,author_association,body,reactions,performed_via_github_app,issue
https://github.com/pydata/xarray/pull/3258#issuecomment-529168271,https://api.github.com/repos/pydata/xarray/issues/3258,529168271,MDEyOklzc3VlQ29tbWVudDUyOTE2ODI3MQ==,2448579,2019-09-08T04:20:19Z,2019-09-08T04:20:19Z,MEMBER,Closing in favour of #3276,"{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,484752930
https://github.com/pydata/xarray/pull/3258#issuecomment-527186872,https://api.github.com/repos/pydata/xarray/issues/3258,527186872,MDEyOklzc3VlQ29tbWVudDUyNzE4Njg3Mg==,2448579,2019-09-02T15:34:21Z,2019-09-02T15:34:21Z,MEMBER,Thanks. That worked. I have a new version up in #3276 that works with both DataArrays and Datasets.,"{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,484752930
https://github.com/pydata/xarray/pull/3258#issuecomment-526751676,https://api.github.com/repos/pydata/xarray/issues/3258,526751676,MDEyOklzc3VlQ29tbWVudDUyNjc1MTY3Ng==,2448579,2019-08-30T21:11:28Z,2019-08-30T21:11:28Z,MEMBER,"Thanks @mrocklin. Unfortunately that doesn't work with the Dataset constructor. With a list it treats it as array-like
```
The following notations are accepted:
- mapping {var name: DataArray}
- mapping {var name: Variable}
- mapping {var name: (dimension name, array-like)}
- mapping {var name: (tuple of dimension names, array-like)}
- mapping {dimension name: array-like}
(it will be automatically moved to coords, see below)
```
Unless @shoyer has another idea, I guess I can insert creating a DataArray into the graph and then refer to those keys in the Dataset constructor.","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,484752930
https://github.com/pydata/xarray/pull/3258#issuecomment-525965607,https://api.github.com/repos/pydata/xarray/issues/3258,525965607,MDEyOklzc3VlQ29tbWVudDUyNTk2NTYwNw==,2448579,2019-08-28T23:51:43Z,2019-08-28T23:53:46Z,MEMBER,"I started prototyping a Dataset version. Here's what I have:
``` python
import dask
import xarray as xr
darray = xr.DataArray(np.ones((10, 20)),
dims=['x', 'y'],
coords={'x': np.arange(10), 'y': np.arange(100, 120)})
dset = darray.to_dataset(name='a')
dset['b'] = dset.a + 50
dset['c'] = (dset.x + 20)
dset = dset.chunk({'x': 4, 'y': 5})
```
The function I'm applying takes a dataset and returns a DataArray because that's easy to test without figuring out how to assemble everything back into a dataset.
``` python
import itertools
# function takes dataset and returns dataarray so that I can check that things work without reconstructing a dataset
def function(ds):
return ds.a + 10
dataset_dims = list(dset.dims)
graph = {}
gname = 'dsnew'
# map dims to list of chunk indexes
# If different variables have different chunking along the same dim
# the call to .chunks will raise an error.
ichunk = {dim: range(len(dset.chunks[dim])) for dim in dataset_dims}
# iterate over all possible chunk combinations
for v in itertools.product(*ichunk.values()):
chunk_index_dict = dict(zip(dataset_dims, v))
data_vars = {}
for name, variable in dset.data_vars.items():
# why do does dask_keys have an extra level?
# the [0] is not required for dataarrays
var_dask_keys = variable.__dask_keys__()[0]
# recursively index into dask_keys nested list
chunk = var_dask_keys
for dim in variable.dims:
chunk = chunk[chunk_index_dict[dim]]
# I have key corresponding to chunk
# this tuple is in a dictionary passed to xr.Dataset()
# dask doesn't seem to replace this with a numpy array at execution time.
data_vars[name] = (variable.dims, chunk)
graph[(gname, ) + v] = (function, (xr.Dataset, data_vars))
final_graph = dask.highlevelgraph.HighLevelGraph.from_collections(name, graph, dependencies=[dset])
```
Elements of the graph look like
```
('dsnew', 0, 0): (,
(xarray.core.dataset.Dataset,
{'a': (('x', 'y'), ('xarray-a-f178df193efafa67203f3862b3f9f0f4', 0, 0)),
'b': (('x', 'y'), ('xarray-b-e2d8d06bb9e5c1f351671a94816bd331', 0, 0)),
'c': (('x',), ('xarray-c-d90f8b2af715b53f4c170be391239655', 0))}))
```
This doesn't work because dask doesn't replace the keys by numpy arrays when the `xr.Dataset` call is executed.
```
result = dask.array.Array(final_graph, name=gname, chunks=dset.a.data.chunks, meta=dset.a.data._meta)
dask.compute(result)
```
```
ValueError: Could not convert tuple of form (dims, data[, attrs, encoding]): (('x', 'y'), ('xarray-a-f178df193efafa67203f3862b3f9f0f4', 0, 0)) to Variable.
```
The graph is ""disconnected"":

I'm not sure what I'm doing wrong here. An equivalent version for DataArrays works perfectly.","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,484752930