id,node_id,number,title,user,state,locked,assignee,milestone,comments,created_at,updated_at,closed_at,author_association,active_lock_reason,draft,pull_request,body,reactions,performed_via_github_app,state_reason,repo,type 2117245042,I_kwDOAMm_X85-Mphy,8703,calling to_zarr inside map_blocks function results in missing values,23472459,closed,0,,,8,2024-02-04T18:21:40Z,2024-04-11T06:53:45Z,2024-04-11T06:53:45Z,NONE,,,,"### What happened? I want to work with a huge dataset stored in hdf5 loaded in chunks. Each chunk contains part of my data that should be saved to a specific region of zarr files. I need to follow the original order of chunks. I found it a convenient way to use a `map_blocks` function for this purpose. However, I have missing values in the final zarr file. Some chunks or parts of chunks are not stored. I used a simplified scenario for code documenting this behavior. The initial zarr file of zeros is filled with ones. There are always some parts where there are still zeros. ### What did you expect to happen? _No response_ ### Minimal Complete Verifiable Example ```Python import os import shutil import xarray as xr import numpy as np import dask.array as da xr.show_versions() zarr_file = ""file.zarr"" if os.path.exists(zarr_file): shutil.rmtree(zarr_file) chunk_size = 5 shape = (50, 32, 1000) ones_dataset = xr.Dataset({""data"": xr.ones_like(xr.DataArray(np.empty(shape)))}) ones_dataset = ones_dataset.chunk({'dim_0': chunk_size}) chunk_indices = np.arange(len(ones_dataset.chunks['dim_0'])) chunk_ids = np.repeat(np.arange(ones_dataset.sizes[""dim_0""] // chunk_size), chunk_size) chunk_ids_dask_array = da.from_array(chunk_ids, chunks=(chunk_size,)) # Append the chunk IDs Dask array as a new variable to the existing dataset ones_dataset['chunk_id'] = (('dim_0',), chunk_ids_dask_array) # Create a new dataset filled with zeros zeros_dataset = xr.Dataset({""data"": xr.zeros_like(xr.DataArray(np.empty(shape)))}) zeros_dataset.to_zarr(zarr_file, compute=False) def process_chunk(chunk_dataset): chunk_id = int(chunk_dataset[""chunk_id""][0]) chunk_dataset_to_store = chunk_dataset.drop_vars(""chunk_id"") start_index = chunk_id * chunk_size end_index = chunk_id * chunk_size + chunk_size chunk_dataset_to_store.to_zarr(zarr_file, region={'dim_0': slice(start_index, end_index)}) return chunk_dataset ones_dataset.map_blocks(process_chunk, template=ones_dataset).compute() # Load data stored in zarr zarr_data = xr.open_zarr(zarr_file, chunks={'dim_0': chunk_size}) # Find differences for var_name in zarr_data.variables: try: xr.testing.assert_equal(zarr_data[var_name], ones_dataset[var_name]) except AssertionError: print(f""Differences in {var_name}:"") print(zarr_data[var_name].values) print(ones_dataset[var_name].values) ``` ### MVCE confirmation - [X] Minimal example — the example is as focused as reasonably possible to demonstrate the underlying issue in xarray. - [X] Complete example — the example is self-contained, including all data and the text of any traceback. - [X] Verifiable example — the example copy & pastes into an IPython prompt or [Binder notebook](https://mybinder.org/v2/gh/pydata/xarray/main?urlpath=lab/tree/doc/examples/blank_template.ipynb), returning the result. - [X] New issue — a search of GitHub Issues suggests this is not a duplicate. - [X] Recent environment — the issue occurs with the latest version of xarray and its dependencies. ### Relevant log output _No response_ ### Anything else we need to know? _No response_ ### Environment
INSTALLED VERSIONS ------------------ commit: None python: 3.10.12 (main, Nov 20 2023, 15:14:05) [GCC 11.4.0] python-bits: 64 OS: Linux OS-release: 6.5.0-15-generic machine: x86_64 processor: x86_64 byteorder: little LC_ALL: None LANG: en_US.UTF-8 LOCALE: ('en_US', 'UTF-8') libhdf5: 1.12.2 libnetcdf: 4.9.3-development xarray: 2024.1.1 pandas: 2.1.4 numpy: 1.26.3 scipy: 1.11.4 netCDF4: 1.6.5 pydap: None h5netcdf: 1.3.0 h5py: 3.10.0 Nio: None zarr: 2.16.1 cftime: 1.6.3 nc_time_axis: 1.4.1 iris: None bottleneck: 1.3.7 dask: 2024.1.1 distributed: 2024.1.0 matplotlib: 3.8.2 cartopy: 0.22.0 seaborn: 0.13.1 numbagg: 0.6.8 fsspec: 2023.12.2 cupy: None pint: None sparse: None flox: 0.8.9 numpy_groupies: 0.10.2 setuptools: 69.0.2 pip: 23.3.1 conda: None pytest: 7.4.4 mypy: None IPython: None sphinx: None
","{""url"": ""https://api.github.com/repos/pydata/xarray/issues/8703/reactions"", ""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,completed,13221727,issue 2134451073,PR_kwDOAMm_X85m3ZGA,8746,User guide zarr file missing values note,23472459,open,0,,,2,2024-02-14T14:13:50Z,2024-02-18T04:52:28Z,,FIRST_TIME_CONTRIBUTOR,,0,pydata/xarray/pulls/8746," - [ ] Closes #8703 ","{""url"": ""https://api.github.com/repos/pydata/xarray/issues/8746/reactions"", ""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,,13221727,pull