home / github / issue_comments

Menu
  • Search all tables
  • GraphQL API

issue_comments: 525965607

This data as json

html_url issue_url id node_id user created_at updated_at author_association body reactions performed_via_github_app issue
https://github.com/pydata/xarray/pull/3258#issuecomment-525965607 https://api.github.com/repos/pydata/xarray/issues/3258 525965607 MDEyOklzc3VlQ29tbWVudDUyNTk2NTYwNw== 2448579 2019-08-28T23:51:43Z 2019-08-28T23:53:46Z MEMBER

I started prototyping a Dataset version. Here's what I have:

``` python import dask import xarray as xr

darray = xr.DataArray(np.ones((10, 20)), dims=['x', 'y'], coords={'x': np.arange(10), 'y': np.arange(100, 120)}) dset = darray.to_dataset(name='a') dset['b'] = dset.a + 50 dset['c'] = (dset.x + 20) dset = dset.chunk({'x': 4, 'y': 5}) ```

The function I'm applying takes a dataset and returns a DataArray because that's easy to test without figuring out how to assemble everything back into a dataset. ``` python import itertools

function takes dataset and returns dataarray so that I can check that things work without reconstructing a dataset

def function(ds): return ds.a + 10

dataset_dims = list(dset.dims)

graph = {} gname = 'dsnew'

map dims to list of chunk indexes

If different variables have different chunking along the same dim

the call to .chunks will raise an error.

ichunk = {dim: range(len(dset.chunks[dim])) for dim in dataset_dims}

iterate over all possible chunk combinations

for v in itertools.product(*ichunk.values()): chunk_index_dict = dict(zip(dataset_dims, v)) data_vars = {} for name, variable in dset.data_vars.items(): # why do does dask_keys have an extra level? # the [0] is not required for dataarrays var_dask_keys = variable.dask_keys()[0]

    # recursively index into dask_keys nested list
    chunk = var_dask_keys
    for dim in variable.dims:
        chunk = chunk[chunk_index_dict[dim]]

    # I have key corresponding to chunk
    # this tuple is in a dictionary passed to xr.Dataset()
    # dask doesn't seem to replace this with a numpy array at execution time.
    data_vars[name] = (variable.dims, chunk)

graph[(gname, ) + v] = (function, (xr.Dataset, data_vars))

final_graph = dask.highlevelgraph.HighLevelGraph.from_collections(name, graph, dependencies=[dset]) ```

Elements of the graph look like ('dsnew', 0, 0): (<function __main__.function(ds)>, (xarray.core.dataset.Dataset, {'a': (('x', 'y'), ('xarray-a-f178df193efafa67203f3862b3f9f0f4', 0, 0)), 'b': (('x', 'y'), ('xarray-b-e2d8d06bb9e5c1f351671a94816bd331', 0, 0)), 'c': (('x',), ('xarray-c-d90f8b2af715b53f4c170be391239655', 0))}))

This doesn't work because dask doesn't replace the keys by numpy arrays when the xr.Dataset call is executed.

result = dask.array.Array(final_graph, name=gname, chunks=dset.a.data.chunks, meta=dset.a.data._meta) dask.compute(result)

ValueError: Could not convert tuple of form (dims, data[, attrs, encoding]): (('x', 'y'), ('xarray-a-f178df193efafa67203f3862b3f9f0f4', 0, 0)) to Variable.

The graph is "disconnected":

I'm not sure what I'm doing wrong here. An equivalent version for DataArrays works perfectly.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  484752930
Powered by Datasette · Queries took 1.975ms · About: xarray-datasette