id,node_id,number,title,user,state,locked,assignee,milestone,comments,created_at,updated_at,closed_at,author_association,active_lock_reason,draft,pull_request,body,reactions,performed_via_github_app,state_reason,repo,type
1639841581,I_kwDOAMm_X85hvf8t,7672,to_zarr writes unexpected NaNs with chunks=-1,8249360,open,0,,,5,2023-03-24T18:15:06Z,2023-11-09T06:10:21Z,,NONE,,,,"### What happened?

I'm running into some unexpected behavior with `ds.to_zarr` when my encoding includes `chunks=-1` and `ds` is a dataset that I created by operating on zarr files opened from disk. When I run the following example code, many of my values are `NaN`. When I run the same code, but with `ds.load()` before `ds.to_zarr()`, the correct, non-NaN values are saved.

### What did you expect to happen?

My data would be written the same regardless of whether I explicitly loaded the dataset.
The documentation for `xarray.Dataset.load` includes the following:

> Normally, it should not be necessary to call this method in user code, because all xarray functions should either work on deferred data or load data automatically. However, this method can be necessary when working with many file objects on disk.

I encountered this situation when operating on datasets that had been loaded from disk (`.sel`, then `concat`), so this seems like a situation that the second sentence addresses, but I did not expect it to silently fail to write the correct data in the way that it did.

### Minimal Complete Verifiable Example

```Python
import pandas as pd
import xarray as xr
import numpy as np

def create_dataset(time, site):
  temperature = 15 + 8 * np.random.randn(1, 3)
  precipitation = 10 * np.random.rand(1, 3)

  ds = xr.Dataset(
      data_vars=dict(
          temperature=([""site"", ""time""], temperature),
          precipitation=([""site"", ""time""], precipitation),
      ),

      coords=dict(
          site=site,
          time=time,
      ),
      attrs=dict(description=""Weather related data.""),
  )
  return ds

time_1 = pd.date_range(""2014-09-06"", periods=3)
time_2 = pd.date_range(""2014-09-09"", periods=3)

# create and save the first dataset as a zarr
ds_a = create_dataset(time_1, [""site_1""])
fname_a = '/tmp/ds_a.zarr'
ds_a.to_zarr(fname_a, mode='w')
ds_a_from_disk = xr.open_dataset(fname_a, engine='zarr', chunks={})

# create and save the second dataset as a zarr
ds_b = create_dataset(time_2, [""site_1""])
fname_b = '/tmp/ds_b.zarr'
ds_b.to_zarr(fname_b, mode='w')
ds_b_from_disk = xr.open_dataset(fname_b, engine='zarr', chunks={})

# concatenate the datasets
ds = xr.concat([ds_a_from_disk.sel(site=""site_1""), ds_b_from_disk.sel(site=""site_1"")], dim='time')

# save all data in one chunk
encoding = {var: {'chunks': -1} for var in list(ds) + list(ds.coords)}
fname = '/tmp/concated.zarr'

# Uncomment the following line to fix this issue
# ds.load()

# save the dataset
ds.to_zarr(fname, mode='w', encoding=encoding)

ds_from_disk = xr.open_dataset(fname, engine='zarr')
print(ds_from_disk.to_dataframe())
```


### MVCE confirmation

- [X] Minimal example — the example is as focused as reasonably possible to demonstrate the underlying issue in xarray.
- [X] Complete example — the example is self-contained, including all data and the text of any traceback.
- [X] Verifiable example — the example copy & pastes into an IPython prompt or [Binder notebook](https://mybinder.org/v2/gh/pydata/xarray/main?urlpath=lab/tree/doc/examples/blank_template.ipynb), returning the result.
- [X] New issue — a search of GitHub Issues suggests this is not a duplicate.

### Relevant log output

_No response_

### Anything else we need to know?

Example output without `ds.load()`:
```
            precipitation    site  temperature
time                                          
2014-09-06            NaN  site_1     9.805297
2014-09-07            NaN  site_1    16.119194
2014-09-08            NaN  site_1     4.226150
2014-09-09       7.275470  site_1          NaN
2014-09-10       2.899134  site_1          NaN
2014-09-11       5.777094  site_1          NaN
```

Example output with `ds.load()`:
```
            precipitation    site  temperature
time                                          
2014-09-06       3.445305  site_1    18.144503
2014-09-07       7.708728  site_1    20.289742
2014-09-08       7.358939  site_1    19.996060
2014-09-09       6.211692  site_1     9.748291
2014-09-10       4.981796  site_1    -7.676436
2014-09-11       8.667885  site_1    31.934328
```

My hunch is that this has to do with a mismatch between Dask chunks in the unloaded dataset and the chunks specified in `to_zarr`, but if they are incompatible I would expect to see an error surfaced.

### Environment

<details>

INSTALLED VERSIONS
------------------
commit: None
python: 3.9.16 (main, Dec  7 2022, 01:11:51) 
[GCC 9.4.0]
python-bits: 64
OS: Linux
OS-release: 5.19.0-35-generic
machine: x86_64
processor: x86_64
byteorder: little
LC_ALL: None
LANG: None
LOCALE: ('en_US', 'UTF-8')
libhdf5: 1.12.0
libnetcdf: None

xarray: 2023.3.0
pandas: 1.4.0
numpy: 1.22.4
scipy: 1.8.0
netCDF4: None
pydap: None
h5netcdf: None
...
pytest: 6.2.2
mypy: None
IPython: 8.3.0
sphinx: None

</details>
","{""url"": ""https://api.github.com/repos/pydata/xarray/issues/7672/reactions"", ""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,,13221727,issue