html_url,issue_url,id,node_id,user,created_at,updated_at,author_association,body,reactions,performed_via_github_app,issue https://github.com/pydata/xarray/pull/3258#issuecomment-529168271,https://api.github.com/repos/pydata/xarray/issues/3258,529168271,MDEyOklzc3VlQ29tbWVudDUyOTE2ODI3MQ==,2448579,2019-09-08T04:20:19Z,2019-09-08T04:20:19Z,MEMBER,Closing in favour of #3276,"{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,484752930 https://github.com/pydata/xarray/pull/3258#issuecomment-527187603,https://api.github.com/repos/pydata/xarray/issues/3258,527187603,MDEyOklzc3VlQ29tbWVudDUyNzE4NzYwMw==,306380,2019-09-02T15:37:18Z,2019-09-02T15:37:18Z,MEMBER,"I'm glad to see progress here. FWIW, I think that many people would be quite happy with a version that just worked for DataArrays, in case that's faster to get in than the full solution with DataSets.","{""total_count"": 1, ""+1"": 1, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,484752930 https://github.com/pydata/xarray/pull/3258#issuecomment-527186872,https://api.github.com/repos/pydata/xarray/issues/3258,527186872,MDEyOklzc3VlQ29tbWVudDUyNzE4Njg3Mg==,2448579,2019-09-02T15:34:21Z,2019-09-02T15:34:21Z,MEMBER,Thanks. That worked. I have a new version up in #3276 that works with both DataArrays and Datasets.,"{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,484752930 https://github.com/pydata/xarray/pull/3258#issuecomment-526756738,https://api.github.com/repos/pydata/xarray/issues/3258,526756738,MDEyOklzc3VlQ29tbWVudDUyNjc1NjczOA==,306380,2019-08-30T21:31:49Z,2019-08-30T21:32:02Z,MEMBER,"Then you can construct a tuple as a task `(1, 2, 3)` -> `(tuple, [1, 2, 3])`","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,484752930 https://github.com/pydata/xarray/pull/3258#issuecomment-526751676,https://api.github.com/repos/pydata/xarray/issues/3258,526751676,MDEyOklzc3VlQ29tbWVudDUyNjc1MTY3Ng==,2448579,2019-08-30T21:11:28Z,2019-08-30T21:11:28Z,MEMBER,"Thanks @mrocklin. Unfortunately that doesn't work with the Dataset constructor. With a list it treats it as array-like ``` The following notations are accepted: - mapping {var name: DataArray} - mapping {var name: Variable} - mapping {var name: (dimension name, array-like)} - mapping {var name: (tuple of dimension names, array-like)} - mapping {dimension name: array-like} (it will be automatically moved to coords, see below) ``` Unless @shoyer has another idea, I guess I can insert creating a DataArray into the graph and then refer to those keys in the Dataset constructor.","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,484752930 https://github.com/pydata/xarray/pull/3258#issuecomment-525966384,https://api.github.com/repos/pydata/xarray/issues/3258,525966384,MDEyOklzc3VlQ29tbWVudDUyNTk2NjM4NA==,306380,2019-08-28T23:54:48Z,2019-08-28T23:54:48Z,MEMBER,"Dask doesn't traverse through tuples to find possible keys, so the keys here are hidden from view: ```python {'a': (('x', 'y'), ('xarray-a-f178df193efafa67203f3862b3f9f0f4', 0, 0)), ``` I recommend changing wrapping tuples with lists: ```diff - {'a': (('x', 'y'), ('xarray-a-f178df193efafa67203f3862b3f9f0f4', 0, 0)), + {'a': [('x', 'y'), ('xarray-a-f178df193efafa67203f3862b3f9f0f4', 0, 0)], ```","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,484752930 https://github.com/pydata/xarray/pull/3258#issuecomment-525965607,https://api.github.com/repos/pydata/xarray/issues/3258,525965607,MDEyOklzc3VlQ29tbWVudDUyNTk2NTYwNw==,2448579,2019-08-28T23:51:43Z,2019-08-28T23:53:46Z,MEMBER,"I started prototyping a Dataset version. Here's what I have: ``` python import dask import xarray as xr darray = xr.DataArray(np.ones((10, 20)), dims=['x', 'y'], coords={'x': np.arange(10), 'y': np.arange(100, 120)}) dset = darray.to_dataset(name='a') dset['b'] = dset.a + 50 dset['c'] = (dset.x + 20) dset = dset.chunk({'x': 4, 'y': 5}) ``` The function I'm applying takes a dataset and returns a DataArray because that's easy to test without figuring out how to assemble everything back into a dataset. ``` python import itertools # function takes dataset and returns dataarray so that I can check that things work without reconstructing a dataset def function(ds): return ds.a + 10 dataset_dims = list(dset.dims) graph = {} gname = 'dsnew' # map dims to list of chunk indexes # If different variables have different chunking along the same dim # the call to .chunks will raise an error. ichunk = {dim: range(len(dset.chunks[dim])) for dim in dataset_dims} # iterate over all possible chunk combinations for v in itertools.product(*ichunk.values()): chunk_index_dict = dict(zip(dataset_dims, v)) data_vars = {} for name, variable in dset.data_vars.items(): # why do does dask_keys have an extra level? # the [0] is not required for dataarrays var_dask_keys = variable.__dask_keys__()[0] # recursively index into dask_keys nested list chunk = var_dask_keys for dim in variable.dims: chunk = chunk[chunk_index_dict[dim]] # I have key corresponding to chunk # this tuple is in a dictionary passed to xr.Dataset() # dask doesn't seem to replace this with a numpy array at execution time. data_vars[name] = (variable.dims, chunk) graph[(gname, ) + v] = (function, (xr.Dataset, data_vars)) final_graph = dask.highlevelgraph.HighLevelGraph.from_collections(name, graph, dependencies=[dset]) ``` Elements of the graph look like ``` ('dsnew', 0, 0): (, (xarray.core.dataset.Dataset, {'a': (('x', 'y'), ('xarray-a-f178df193efafa67203f3862b3f9f0f4', 0, 0)), 'b': (('x', 'y'), ('xarray-b-e2d8d06bb9e5c1f351671a94816bd331', 0, 0)), 'c': (('x',), ('xarray-c-d90f8b2af715b53f4c170be391239655', 0))})) ``` This doesn't work because dask doesn't replace the keys by numpy arrays when the `xr.Dataset` call is executed. ``` result = dask.array.Array(final_graph, name=gname, chunks=dset.a.data.chunks, meta=dset.a.data._meta) dask.compute(result) ``` ``` ValueError: Could not convert tuple of form (dims, data[, attrs, encoding]): (('x', 'y'), ('xarray-a-f178df193efafa67203f3862b3f9f0f4', 0, 0)) to Variable. ``` The graph is ""disconnected"": ![image](https://user-images.githubusercontent.com/2448579/63900034-c0780000-c9ee-11e9-9b40-22e88f5c6208.png) I'm not sure what I'm doing wrong here. An equivalent version for DataArrays works perfectly.","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,484752930 https://github.com/pydata/xarray/pull/3258#issuecomment-525427300,https://api.github.com/repos/pydata/xarray/issues/3258,525427300,MDEyOklzc3VlQ29tbWVudDUyNTQyNzMwMA==,1217238,2019-08-27T18:30:34Z,2019-08-27T18:30:34Z,MEMBER,"> apply_ufunc is extremely powerful, and when you need to cope with all possible shape transformations, I suspect its verbosity is quite necessary. > It's just that, when all you need to do is apply an elementwise, embarassingly parallel function (80% of the times in my real life experience), apply_ufunc is overkill. Yes, 100% agreed! There is a real need for a simpler version of `apply_ufunc`. > The thing I have against the name map_blocks is that backends other than dask have no notion of blocks... I think the functionality in this PR is fundamentally dask specific. We shouldn't make a habit of adding backend specific features, but it makes sense in limited cases.","{""total_count"": 1, ""+1"": 1, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,484752930 https://github.com/pydata/xarray/pull/3258#issuecomment-525425560,https://api.github.com/repos/pydata/xarray/issues/3258,525425560,MDEyOklzc3VlQ29tbWVudDUyNTQyNTU2MA==,6213168,2019-08-27T18:26:17Z,2019-08-27T18:26:17Z,MEMBER,"@shoyer let me rephrase it - apply_ufunc is extremely powerful, and when you need to cope with all possible shape transformations, I suspect its verbosity is quite necessary. It's just that, when all you need to do is apply an elementwise, embarassingly parallel function (80% of the times in my real life experience), apply_ufunc is overkill. The thing I have against the name map_blocks is that backends other than dask have no notion of blocks...","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,484752930 https://github.com/pydata/xarray/pull/3258#issuecomment-525384446,https://api.github.com/repos/pydata/xarray/issues/3258,525384446,MDEyOklzc3VlQ29tbWVudDUyNTM4NDQ0Ng==,1217238,2019-08-27T16:40:32Z,2019-08-27T16:40:32Z,MEMBER,"> * could we call it just ""map""? It makes sense as this thing would be very useful for non-dask based arrays too. Working routinely with scipy (chiefly with scipy.stats transforms), I tire a lot of writing very verbose `xarray.apply_ufunc` calls. I agree that `apply_ufunc` is overly verbose. See https://github.com/pydata/xarray/issues/1074 and https://github.com/pydata/xarray/issues/1618 (and issues linked therein) for discussion about alternative APIs. I still think this particular set of functionality should be called `map_blocks`, because it works by applying functions over each block, very similar to dask's `map_blocks`.","{""total_count"": 1, ""+1"": 1, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,484752930 https://github.com/pydata/xarray/pull/3258#issuecomment-525298264,https://api.github.com/repos/pydata/xarray/issues/3258,525298264,MDEyOklzc3VlQ29tbWVudDUyNTI5ODI2NA==,6213168,2019-08-27T13:21:44Z,2019-08-27T13:21:44Z,MEMBER,"Hi, A few design opinions: 1. could we call it just ""map""? It makes sense as this thing would be very useful for non-dask based arrays too. Working routinely with scipy (chiefly with scipy.stats transforms), I tire a lot of writing very verbose ``xarray.apply_ufunc`` calls. 2. could we have it as a method of DataArray and Dataset, to allow for method chaining? e.g. ```python myarray.map(func1).chunk().map(func2).sum().compute() ```","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,484752930