html_url,issue_url,id,node_id,user,created_at,updated_at,author_association,body,reactions,performed_via_github_app,issue https://github.com/pydata/xarray/issues/2237#issuecomment-1126847735,https://api.github.com/repos/pydata/xarray/issues/2237,1126847735,IC_kwDOAMm_X85DKlT3,2448579,2022-05-15T02:44:06Z,2022-05-15T02:44:06Z,MEMBER,"Fixed on main with `ds.groupby(""year"").mean(method=""blockwise"")` ![image](https://user-images.githubusercontent.com/2448579/168454897-39769e31-020b-4a13-bbe8-81a53c0605d8.png) ","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,333312849 https://github.com/pydata/xarray/issues/2237#issuecomment-789078512,https://api.github.com/repos/pydata/xarray/issues/2237,789078512,MDEyOklzc3VlQ29tbWVudDc4OTA3ODUxMg==,2448579,2021-03-02T17:29:51Z,2021-03-02T18:03:17Z,MEMBER,"I think the behaviour in Ryan's most recent comment is a consequence of groupby.mean being ``` python results = [] for group_idx in group_indices: # one group per year group = ds.isel(group_idx) # (SPLIT) results.append(group.mean()) # (APPLY) return xr.concat(results, dim=""year"") # COMBINE results in one chunk per year (one chunk per element in results) ``` I think the fundamental question is: Is it really possible for dask to recognize that the chunk structure after the `combine` step could be consolidated with an arbitrary number of `apply` steps in the middle ? OR When a computation maps a single chunk to many chunks, should dask consolidate the output chunks (using `array.chunk-size`)? We can explicitly ask for consolidation of chunks by saying the output should be chunked `5` along `year` ``` python dask.config.set({""optimization.fuse.ave-width"": 6}) # note > 5 ( ds.foo.groupby(""year"") .mean(dim=""time"") .chunk({""year"": 5}) # really important, why and how would dask choose this automatically/ .data.visualize(optimize_graph=False) ) ``` ![image](https://user-images.githubusercontent.com/2448579/109686030-1fb45480-7b3f-11eb-876b-4a7b076e301a.png) Then if we set `optimization.fuse.ave-width` appropriately, we get the graph we want after optimization ``` python dask.config.set({""optimization.fuse.ave-width"": 6}) ( ds.foo.groupby(""year"") .mean(dim=""time"") .chunk({""year"": 5}) # really important .data.visualize(optimize_graph=True) ) ``` ![image](https://user-images.githubusercontent.com/2448579/109686164-3fe41380-7b3f-11eb-80ef-372b189356cd.png) Can we make dask recognize that the 5 getitem tasks from input-chunk-0, at the bottom of each tower, can be fused to a single task? In that case, fuse the 5 getitem tasks and ""propagate"" that fusion up the tower. I guess another failure here is that when `fuse.ave-width` is 3 (< width of tower), why isn't dask fusing to make three ""sub-towers"" per-tower? Even that would help reduce number of tasks. ``` dask.config.set({""optimization.fuse.ave-width"": 3}) ( ds.foo.groupby(""year"") .mean(dim=""time"") .chunk({""year"": 5}) # really important .data.visualize(optimize_graph=True) ) ``` ![image](https://user-images.githubusercontent.com/2448579/109693420-dec03e00-7b46-11eb-84f3-55d85c8397fd.png) ","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,333312849 https://github.com/pydata/xarray/issues/2237#issuecomment-789090356,https://api.github.com/repos/pydata/xarray/issues/2237,789090356,MDEyOklzc3VlQ29tbWVudDc4OTA5MDM1Ng==,2448579,2021-03-02T17:48:01Z,2021-03-02T17:48:47Z,MEMBER,"Reading up on fusion, the [docstring](https://docs.dask.org/en/latest/optimize.html#dask.optimization.fuse) says > This optimization applies to all reductions–tasks that have at most one dependent–so it may be viewed as fusing “multiple input, single output” groups of tasks into a single task. So we need the opposite : fuse ""single input, multiple output"" to a single task when some appropriate heuristic is satisfied. ","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,333312849 https://github.com/pydata/xarray/issues/2237#issuecomment-482241098,https://api.github.com/repos/pydata/xarray/issues/2237,482241098,MDEyOklzc3VlQ29tbWVudDQ4MjI0MTA5OA==,2448579,2019-04-11T18:22:41Z,2019-04-11T18:22:41Z,MEMBER,Can this be closed or is there something to do on the xarray side now that dask/dask#3648 has been merged?,"{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,333312849