issues: 1506437087

This data as json

id	node_id	number	title	user	state	locked	assignee	milestone	comments	created_at	updated_at	closed_at	author_association	active_lock_reason	draft	pull_request	body	reactions	performed_via_github_app	state_reason	repo	type
1506437087	I_kwDOAMm_X85Zymff	7397	Memory issue merging NetCDF files using xarray.open_mfdataset and to_netcdf	720460	open	0			9	2022-12-21T15:00:05Z	2023-09-16T12:42:51Z		NONE				What happened? I have 5 NetCDF files (1 GiB each). They have 4 dimensions: time, depth, lat, lon. All the files have exactly the same depth, lat, lon. The time axis have the same interval and there are no gaps on this axis for all the 5 files (and there is continuity in the axis between files). All I am doing is merging the files along the time-axis and saving it to a new NetCDF file. Running the script, I allocated 185 GiB of memory (the maximum in my cluster). The program runs until the to_netcdf() function is called. I get an error stating there is not enough memory. What did you expect to happen? As the 5 files are 1 GiB each, and I allocated 185 GiB (far more than 5² GiB), I expected the program to run and not require more than the allocated memory (after all, I gave 37 times the combined size of the files). Minimal Complete Verifiable Example ```Python path = './data/data_.nc' # files are: data_1.nc data_2.nc data_3.nc data_4.nc data_5.nc data = xr.open_mfdataset(path) data = data.load() # uses 5 GiB - tested with a memory profiler data.to_netcdf('./output/combined.nc') # ``` MVCE confirmation [X] Minimal example — the example is as focused as reasonably possible to demonstrate the underlying issue in xarray. [ ] Complete example — the example is self-contained, including all data and the text of any traceback. [ ] Verifiable example — the example copy & pastes into an IPython prompt or Binder notebook, returning the result. [X] New issue — a search of GitHub Issues suggests this is not a duplicate. Relevant log output Python Traceback (most recent call last): File "/users/me/code/par2.py", line 78, in <module> preprocess_data(year, month) File "/users/me/code/par2.py", line 69, in preprocess_data data.to_netcdf(path=outpath) File "/CSC_CONTAINER/miniconda/envs/env1/lib/python3.10/site-packages/xarray/core/dataset.py", line 1882, in to_netcdf return to_netcdf( # type: ignore # mypy cannot resolve the overloads:( File "/CSC_CONTAINER/miniconda/envs/env1/lib/python3.10/site-packages/xarray/backends/api.py", line 1210, in to_netcdf dump_to_store( File "/CSC_CONTAINER/miniconda/envs/env1/lib/python3.10/site-packages/xarray/backends/api.py", line 1257, in dump_to_store store.store(variables, attrs, check_encoding, writer, unlimited_dims=unlimited_dims) File "/CSC_CONTAINER/miniconda/envs/env1/lib/python3.10/site-packages/xarray/backends/common.py", line 263, in store variables, attributes = self.encode(variables, attributes) File "/CSC_CONTAINER/miniconda/envs/env1/lib/python3.10/site-packages/xarray/backends/common.py", line 352, in encode variables, attributes = cf_encoder(variables, attributes) File "/CSC_CONTAINER/miniconda/envs/env1/lib/python3.10/site-packages/xarray/conventions.py", line 864, in cf_encoder new_vars = {k: encode_cf_variable(v, name=k) for k, v in variables.items()} File "/CSC_CONTAINER/miniconda/envs/env1/lib/python3.10/site-packages/xarray/conventions.py", line 864, in <dictcomp> new_vars = {k: encode_cf_variable(v, name=k) for k, v in variables.items()} File "/CSC_CONTAINER/miniconda/envs/env1/lib/python3.10/site-packages/xarray/conventions.py", line 273, in encode_cf_variable var = coder.encode(var, name=name) File "/CSC_CONTAINER/miniconda/envs/env1/lib/python3.10/site-packages/xarray/coding/variables.py", line 170, in encode data = duck_array_ops.fillna(data, fill_value) File "/CSC_CONTAINER/miniconda/envs/env1/lib/python3.10/site-packages/xarray/core/duck_array_ops.py", line 283, in fillna return where(notnull(data), data, other) File "/CSC_CONTAINER/miniconda/envs/env1/lib/python3.10/site-packages/xarray/core/duck_array_ops.py", line 270, in where return _where(condition, as_shared_dtype([x, y])) File "<__array_function__ internals>", line 180, in where numpy.core._exceptions._ArrayMemoryError: Unable to allocate 43.6 GiB for an array with shape (280, 200, 277, 754) and data type float32 Anything else we need to know? I allocated 185 GiB for this job, from my understanding, this means that merging 5 datasets with 1 GiB each requires more than 185 GiB memory. It sounds like a memory leak to me. I am not the only one with this issue, cf: https://github.com/pydata/xarray/discussions/4890 Environment /CSC_CONTAINER/miniconda/envs/env1/lib/python3.10/site-packages/_distutils_hack/__init__.py:33: UserWarning: Setuptools is replacing distutils. warnings.warn("Setuptools is replacing distutils.") INSTALLED VERSIONS ------------------ commit: None python: 3.10.6 \| packaged by conda-forge \| (main, Aug 22 2022, 20:35:26) [GCC 10.4.0] python-bits: 64 OS: Linux OS-release: 4.18.0-372.26.1.el8_6.x86_64 machine: x86_64 processor: x86_64 byteorder: little LC_ALL: None LANG: en_US.UTF-8 LOCALE: ('en_US', 'UTF-8') libhdf5: 1.12.2 libnetcdf: 4.8.1 xarray: 2022.6.0 pandas: 1.4.4 numpy: 1.23.2 scipy: 1.9.1 netCDF4: 1.6.0 pydap: None h5netcdf: None h5py: 3.7.0 Nio: None zarr: None cftime: 1.6.1 nc_time_axis: None PseudoNetCDF: None rasterio: None cfgrib: None iris: None bottleneck: None dask: 2022.9.0 distributed: 2022.9.0 matplotlib: 3.5.3 cartopy: None seaborn: 0.12.0 numbagg: None fsspec: 2022.8.2 cupy: None pint: None sparse: None flox: None numpy_groupies: None setuptools: 65.3.0 pip: 22.2.2 conda: None pytest: 7.1.3 IPython: 7.33.0 sphinx: 5.1.1	{ "url": "https://api.github.com/repos/pydata/xarray/issues/7397/reactions", "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }			13221727	issue

Links from other tables

4 rows from issues_id in issues_labels
8 rows from issue in issue_comments