html_url,issue_url,id,node_id,user,created_at,updated_at,author_association,body,reactions,performed_via_github_app,issue https://github.com/pydata/xarray/issues/2852#issuecomment-653016746,https://api.github.com/repos/pydata/xarray/issues/2852,653016746,MDEyOklzc3VlQ29tbWVudDY1MzAxNjc0Ng==,1197350,2020-07-02T13:48:39Z,2020-07-02T13:48:39Z,MEMBER,👀 cc @chiaral ,"{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,425320466 https://github.com/pydata/xarray/issues/2852#issuecomment-476678007,https://api.github.com/repos/pydata/xarray/issues/2852,476678007,MDEyOklzc3VlQ29tbWVudDQ3NjY3ODAwNw==,1197350,2019-03-26T14:41:59Z,2019-03-26T14:41:59Z,MEMBER,"``` label (y, x) uint16 dask.array ... geoms_ds.groupby('label')` ``` It is very hard to make this sort of groupby lazy, because you are grouping over the variable `label` itself. Groupby uses a split-apply-combine paradigm to transform the data. The apply and combine steps can be lazy. But the split step cannot. Xarray uses the group variable to determine how to index the array, i.e. which items belong in which group. To do this, it needs to read the _whole variable_ into memory. In this specific example, it sounds like what you want is to compute the histogram of labels. That could be accomplished without groupby. For example, you could use apply_ufunc together with [`dask.array.histogram`](http://docs.dask.org/en/latest/array-api.html#dask.array.histogram). So my recommendation is to think of a way to accomplish what you want that does not involve groupby.","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,425320466