id,node_id,number,title,user,state,locked,assignee,milestone,comments,created_at,updated_at,closed_at,author_association,active_lock_reason,draft,pull_request,body,reactions,performed_via_github_app,state_reason,repo,type
1506437087,I_kwDOAMm_X85Zymff,7397,Memory issue merging NetCDF files using xarray.open_mfdataset and to_netcdf,720460,open,0,,,9,2022-12-21T15:00:05Z,2023-09-16T12:42:51Z,,NONE,,,,"### What happened?

I have 5 NetCDF files (1 GiB each). They have 4 dimensions: time, depth, lat, lon. All the files have exactly the same depth, lat, lon. The time axis have the same interval and there are no gaps on this axis for all the 5 files (and there is continuity in the axis between files).

All I am doing is merging the files along the time-axis and saving it to a new NetCDF file.

Running the script, I allocated 185 GiB of memory (the maximum in my cluster).

The program runs until the to_netcdf() function is called. I get an error stating there is not enough memory.

### What did you expect to happen?

As the 5 files are 1 GiB each, and I allocated 185 GiB (far more than 5² GiB), I expected the program to run and not require more than the allocated memory (after all, I gave 37 times the combined size of the files).

### Minimal Complete Verifiable Example

```Python
path = './data/data_*.nc' # files are: data_1.nc data_2.nc data_3.nc data_4.nc data_5.nc
data = xr.open_mfdataset(path)

data = data.load() # uses 5 GiB - tested with a memory profiler

data.to_netcdf('./output/combined.nc') #
```


### MVCE confirmation

- [X] Minimal example — the example is as focused as reasonably possible to demonstrate the underlying issue in xarray.
- [ ] Complete example — the example is self-contained, including all data and the text of any traceback.
- [ ] Verifiable example — the example copy & pastes into an IPython prompt or [Binder notebook](https://mybinder.org/v2/gh/pydata/xarray/main?urlpath=lab/tree/doc/examples/blank_template.ipynb), returning the result.
- [X] New issue — a search of GitHub Issues suggests this is not a duplicate.

### Relevant log output

```Python
Traceback (most recent call last):
  File ""/users/me/code/par2.py"", line 78, in <module>
    preprocess_data(year, month)
  File ""/users/me/code/par2.py"", line 69, in preprocess_data
    data.to_netcdf(path=outpath)
  File ""/CSC_CONTAINER/miniconda/envs/env1/lib/python3.10/site-packages/xarray/core/dataset.py"", line 1882, in to_netcdf
    return to_netcdf(  # type: ignore  # mypy cannot resolve the overloads:(
  File ""/CSC_CONTAINER/miniconda/envs/env1/lib/python3.10/site-packages/xarray/backends/api.py"", line 1210, in to_netcdf
    dump_to_store(
  File ""/CSC_CONTAINER/miniconda/envs/env1/lib/python3.10/site-packages/xarray/backends/api.py"", line 1257, in dump_to_store
    store.store(variables, attrs, check_encoding, writer, unlimited_dims=unlimited_dims)
  File ""/CSC_CONTAINER/miniconda/envs/env1/lib/python3.10/site-packages/xarray/backends/common.py"", line 263, in store
    variables, attributes = self.encode(variables, attributes)
  File ""/CSC_CONTAINER/miniconda/envs/env1/lib/python3.10/site-packages/xarray/backends/common.py"", line 352, in encode
    variables, attributes = cf_encoder(variables, attributes)
  File ""/CSC_CONTAINER/miniconda/envs/env1/lib/python3.10/site-packages/xarray/conventions.py"", line 864, in cf_encoder
    new_vars = {k: encode_cf_variable(v, name=k) for k, v in variables.items()}
  File ""/CSC_CONTAINER/miniconda/envs/env1/lib/python3.10/site-packages/xarray/conventions.py"", line 864, in <dictcomp>
    new_vars = {k: encode_cf_variable(v, name=k) for k, v in variables.items()}
  File ""/CSC_CONTAINER/miniconda/envs/env1/lib/python3.10/site-packages/xarray/conventions.py"", line 273, in encode_cf_variable
    var = coder.encode(var, name=name)
  File ""/CSC_CONTAINER/miniconda/envs/env1/lib/python3.10/site-packages/xarray/coding/variables.py"", line 170, in encode
    data = duck_array_ops.fillna(data, fill_value)
  File ""/CSC_CONTAINER/miniconda/envs/env1/lib/python3.10/site-packages/xarray/core/duck_array_ops.py"", line 283, in fillna
    return where(notnull(data), data, other)
  File ""/CSC_CONTAINER/miniconda/envs/env1/lib/python3.10/site-packages/xarray/core/duck_array_ops.py"", line 270, in where
    return _where(condition, *as_shared_dtype([x, y]))
  File ""<__array_function__ internals>"", line 180, in where
numpy.core._exceptions._ArrayMemoryError: Unable to allocate 43.6 GiB for an array with shape (280, 200, 277, 754) and data type float32
```


### Anything else we need to know?

I allocated 185 GiB for this job, from my understanding, this means that merging 5 datasets with 1 GiB each requires more than 185 GiB memory. It sounds like a memory leak to me.

I am not the only one with this issue, cf: https://github.com/pydata/xarray/discussions/4890

### Environment

<details>

/CSC_CONTAINER/miniconda/envs/env1/lib/python3.10/site-packages/_distutils_hack/__init__.py:33: UserWarning: Setuptools is replacing distutils.
  warnings.warn(""Setuptools is replacing distutils."")

INSTALLED VERSIONS
------------------
commit: None
python: 3.10.6 | packaged by conda-forge | (main, Aug 22 2022, 20:35:26) [GCC 10.4.0]
python-bits: 64
OS: Linux
OS-release: 4.18.0-372.26.1.el8_6.x86_64
machine: x86_64
processor: x86_64
byteorder: little
LC_ALL: None
LANG: en_US.UTF-8
LOCALE: ('en_US', 'UTF-8')
libhdf5: 1.12.2
libnetcdf: 4.8.1

xarray: 2022.6.0
pandas: 1.4.4
numpy: 1.23.2
scipy: 1.9.1
netCDF4: 1.6.0
pydap: None
h5netcdf: None
h5py: 3.7.0
Nio: None
zarr: None
cftime: 1.6.1
nc_time_axis: None
PseudoNetCDF: None
rasterio: None
cfgrib: None
iris: None
bottleneck: None
dask: 2022.9.0
distributed: 2022.9.0
matplotlib: 3.5.3
cartopy: None
seaborn: 0.12.0
numbagg: None
fsspec: 2022.8.2
cupy: None
pint: None
sparse: None
flox: None
numpy_groupies: None
setuptools: 65.3.0
pip: 22.2.2
conda: None
pytest: 7.1.3
IPython: 7.33.0
sphinx: 5.1.1


</details>
","{""url"": ""https://api.github.com/repos/pydata/xarray/issues/7397/reactions"", ""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,,13221727,issue