id,node_id,number,title,user,state,locked,assignee,milestone,comments,created_at,updated_at,closed_at,author_association,active_lock_reason,draft,pull_request,body,reactions,performed_via_github_app,state_reason,repo,type
1068225524,I_kwDOAMm_X84_q9P0,6036,`xarray.open_zarr()` takes too long to lazy load when the data arrays contain a large number of Dask chunks.,23300143,closed,0,,,10,2021-12-01T10:23:43Z,2023-10-26T17:15:43Z,2023-10-26T16:42:28Z,NONE,,,,"**What happened**:
The aim is to lazy load a big zarr dataset using `xarray.open_zarr()` and then compute only the necessary data. However, it takes too long to lazy load a Zarr dataset and if the dataset is large and contains a large number of Dask chunks the memory will increase while lazy loading the dataset and eventually crash because the process runs out of memory.

In `xarray.open()` the parameter `chunks` is set to `auto`, which means that it will load the chunks size and create Dask chunks per datarray in the dataset. The Dask chunks are created in the `slices_from_chunks(chunks)` function from `https://github.com/dask/dask/blob/main/dask/array/core.py`. The most time consuming seems to be line 232 where all combinations of the dask chunks are created (see https://github.com/dask/dask/blob/a5aecac8313fea30c5503f534c71f325b1775b9c/dask/array/core.py#L216-L232). In our use case we have millions of small chunks and several data arrays which means that opening the Zarr dataset takes too long. In the example below we create a dataset with 3 dimensions with their respective sizes: `{'x': 1000, 'y': 1000, 'z': 1000}`, the dataset has 1 data array (`foo`) and the chunk size is `{'x': 5, 'y': 5, 'z': 5}`. This results in a runtime of approx. 20 seconds which is not acceptable for lazy loading a dataset.

One workaround is to set the parameter `chunks=None` in `xarray.open_zarr()` and we see that we can quickly lazy load the dataset and then proceed by computing only the necessary data in memory in few milliseconds.

We also tried to use `xarray.Dataset.chunk()` to compute the chunks on the lazy loaded dataset but it still takes too long.

Additionally there seems to be 2 memory issues:
- Memory keeps increasing while lazy loading the dataset and it doesn't get freed. May be related to pydata/xarray#6013.
- If the dataset has even smaller chunk size ie. `{'x': 1, 'y': 1, 'z': 1}` the memory keeps increasing until the process crashes.

My questions are:

- Would it be possible to optimize the code to calculate the chunks in less than a second?
- Could all combinations of Dask chunks be saved as meta data when saving the Zarr dataset - such that it would be able to load it from disc instead of calculating the Dask chunks.
- Is there another option to load a Zarr dataset including the information of millions of Dask chunks?

**What you expected to happen**:
I expect to be able to lazy load a large dataset in few milliseconds - including the Dask chunk informations.

**Minimal Complete Verifiable Example**:
Following code is run on a JupyterLab Notebook.

```python
import dask
import snakeviz
import xarray as xr

%load_ext snakeviz

# Create example dataset
chunks = (5, 5, 5)
ds = xr.Dataset(data_vars={
    ""foo"": (('x', 'y', 'z'), dask.array.empty((1000, 1000, 1000), chunks=(1000, 1000, 1000)))})
ds.to_zarr(store='data', group='ds.zarr', compute=False, encoding={'foo': {'chunks': chunks}})
ds_loaded = xr.open_zarr(group='ds.zarr', store='data')
%%snakeviz
ds_big_loaded = xr.open_zarr(store='open_zarr_test', group='ds')   # Runtime: 22 seconds!

```
**Snakeviz output**
![snakeviz_0](https://user-images.githubusercontent.com/23300143/144213645-0202604a-71a7-474a-b9f3-4848c455bf91.PNG)

...

![snakeviz_1](https://user-images.githubusercontent.com/23300143/144213710-a5f30058-78d4-42cb-a28c-0a97daa627ea.PNG)

**Anything else we need to know?**:
The profiling shows that it is the following code from Dask that is the most time consuming when loading a Zarr dataset.  The code is from the `slices_from_chunks(chunks)` function in `https://github.com/dask/dask/blob/main/dask/array/core.py`. The code is run on each data array in the dataset. 

```python
from dask.array.slicing import cached_cumsum
from itertools import product

dim_size = (10, 15_000, 15_000)
chunks = dask.array.empty(dim_size, chunks=(10, 10, 10)).chunks

cumdims = [cached_cumsum(bds, initial_zero=True) for bds in chunks]
slices = [
    [slice(s, s + dim) for s, dim in zip(starts, shapes)]
    for starts, shapes in zip(cumdims, chunks)]
slices = list(product(*slices ))  # Runtime: 5.59 seconds for oné data array!

```

**Environment**:

<details><summary>Output of <tt>xr.show_versions()</tt></summary>

INSTALLED VERSIONS
------------------
commit: None
python: 3.9.7 (default, Sep 16 2021, 13:09:58) 
[GCC 7.5.0]
python-bits: 64
OS: Linux
OS-release: 5.4.0-89-generic
machine: x86_64
processor: x86_64
byteorder: little
LC_ALL: None
LANG: en_US.UTF-8
LOCALE: ('en_US', 'UTF-8')
libhdf5: 1.10.6
libnetcdf: 4.8.1

xarray: 0.19.0
pandas: 1.3.2
numpy: 1.20.3
scipy: 1.7.1
netCDF4: 1.5.7
pydap: None
h5netcdf: None
h5py: None
Nio: None
zarr: 2.10.1
cftime: 1.5.0
nc_time_axis: None
PseudoNetCDF: None
rasterio: 1.2.8
cfgrib: None
iris: None
bottleneck: 1.3.2
dask: 2021.08.1
distributed: 2021.08.1
matplotlib: 3.4.2
cartopy: 0.20.0
seaborn: 0.11.2
numbagg: None
pint: None
setuptools: 58.0.4
pip: 21.2.4
conda: None
pytest: 6.2.4
IPython: 7.27.0
sphinx: None


</details>
","{""url"": ""https://api.github.com/repos/pydata/xarray/issues/6036/reactions"", ""total_count"": 1, ""+1"": 1, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,completed,13221727,issue