id,node_id,number,title,user,state,locked,assignee,milestone,comments,created_at,updated_at,closed_at,author_association,active_lock_reason,draft,pull_request,body,reactions,performed_via_github_app,state_reason,repo,type
1277437106,I_kwDOAMm_X85MJCSy,6709,Means of zarr arrays cause a memory overload in dask workers,74916839,closed,0,,,17,2022-06-20T22:24:39Z,2023-10-09T15:39:35Z,2023-10-09T15:39:35Z,NONE,,,,"### What is your issue?

Hello everyone !

I am submitting this issue here but it is not entirely clear if my problem comes from xarray, dask or zarr.

The goal here is to compute a mean from the GCM anomalies of SSH. The following simple code creates an artificial dataset (a variable is about 90G) with the anomaly fields, and compute the cross-products means.

```python
import dask.array as da
import numpy as np
import xarray as xr

ds = xr.Dataset(
    dict(
        anom_u=([""time"", ""face"", ""j"", ""i""], da.ones((10311, 1, 987, 1920), chunks=(10, 1, 987, 1920))),
        anom_v=([""time"", ""face"", ""j"", ""i""], da.ones((10311, 1, 987, 1920), chunks=(10, 1, 987, 1920))),
    )
)

ds[""anom_uu_mean""] = ([""face"", ""j"", ""i""], np.mean(ds.anom_u.data**2, axis=0))
ds[""anom_vv_mean""] = ([""face"", ""j"", ""i""], np.mean(ds.anom_v.data**2, axis=0))
ds[""anom_uv_mean""] = ([""face"", ""j"", ""i""], np.mean(ds.anom_u.data * ds.anom_v.data, axis=0))

ds[[""anom_uu_mean"", ""anom_vv_mean"", ""anom_uv_mean""]].compute()
```

I was expecting a low memory usage because after using a single chunk of anom_u and anom_v to do a mean iteration, these two could be forgotten. The following figure checks that we are very low on memory usage so all is well.

![image](https://user-images.githubusercontent.com/74916839/174682620-b5c330d2-0fb2-43b7-b3fe-950ecdeec9a5.png)

The matter becomes more complicated when the dataset is opened from a ZARR store. We simply dumped our previous articially generated data to a temporary store, and reloaded it :

```python
import dask.array as da
import numpy as np
import xarray as xr

ds = xr.Dataset(
    dict(
        anom_u=([""time"", ""face"", ""j"", ""i""], da.ones((10311, 1, 987, 1920), chunks=(10, 1, 987, 1920))),
        anom_v=([""time"", ""face"", ""j"", ""i""], da.ones((10311, 1, 987, 1920), chunks=(10, 1, 987, 1920))),
    )
)

store = ""/work/scratch/test_zarr_graph""
ds.to_zarr(store, compute=False, mode=""a"")
ds = xr.open_zarr(store)

ds[""anom_uu_mean""] = ([""face"", ""j"", ""i""], np.mean(ds.anom_u.data**2, axis=0))
ds[""anom_vv_mean""] = ([""face"", ""j"", ""i""], np.mean(ds.anom_v.data**2, axis=0))
ds[""anom_uv_mean""] = ([""face"", ""j"", ""i""], np.mean(ds.anom_u.data * ds.anom_v.data, axis=0))

ds[[""anom_uu_mean"", ""anom_vv_mean"", ""anom_uv_mean""]].compute()
```

![image](https://user-images.githubusercontent.com/74916839/174683111-dd6719b8-0d1b-467e-bdab-1c08bf8a1215.png)

I was expecting a similar behavior between a dataset created from scratch and one created from a zarr store, but it seems not to be the case. I tried using inline_array=True with xr.open_dataset but to no avail. I also tried computing 2 variables instead of 3 and it works properly, so the behavior seems strange to me.

Do you see any reason as to why I am seeing such memory load on my workers ?

Here are the software version I use :
xarray version : 2022.6.0rc0
dask version : 2022.04.1
zarr version : 2.11.1
numpy version : 1.21.6
","{""url"": ""https://api.github.com/repos/pydata/xarray/issues/6709/reactions"", ""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,completed,13221727,issue