issues: 1506437087
This data as json
id | node_id | number | title | user | state | locked | assignee | milestone | comments | created_at | updated_at | closed_at | author_association | active_lock_reason | draft | pull_request | body | reactions | performed_via_github_app | state_reason | repo | type |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
1506437087 | I_kwDOAMm_X85Zymff | 7397 | Memory issue merging NetCDF files using xarray.open_mfdataset and to_netcdf | 720460 | open | 0 | 9 | 2022-12-21T15:00:05Z | 2023-09-16T12:42:51Z | NONE | What happened?I have 5 NetCDF files (1 GiB each). They have 4 dimensions: time, depth, lat, lon. All the files have exactly the same depth, lat, lon. The time axis have the same interval and there are no gaps on this axis for all the 5 files (and there is continuity in the axis between files). All I am doing is merging the files along the time-axis and saving it to a new NetCDF file. Running the script, I allocated 185 GiB of memory (the maximum in my cluster). The program runs until the to_netcdf() function is called. I get an error stating there is not enough memory. What did you expect to happen?As the 5 files are 1 GiB each, and I allocated 185 GiB (far more than 5² GiB), I expected the program to run and not require more than the allocated memory (after all, I gave 37 times the combined size of the files). Minimal Complete Verifiable Example```Python path = './data/data_*.nc' # files are: data_1.nc data_2.nc data_3.nc data_4.nc data_5.nc data = xr.open_mfdataset(path) data = data.load() # uses 5 GiB - tested with a memory profiler data.to_netcdf('./output/combined.nc') # ``` MVCE confirmation
Relevant log output
Anything else we need to know?I allocated 185 GiB for this job, from my understanding, this means that merging 5 datasets with 1 GiB each requires more than 185 GiB memory. It sounds like a memory leak to me. I am not the only one with this issue, cf: https://github.com/pydata/xarray/discussions/4890 Environment
/CSC_CONTAINER/miniconda/envs/env1/lib/python3.10/site-packages/_distutils_hack/__init__.py:33: UserWarning: Setuptools is replacing distutils.
warnings.warn("Setuptools is replacing distutils.")
INSTALLED VERSIONS
------------------
commit: None
python: 3.10.6 | packaged by conda-forge | (main, Aug 22 2022, 20:35:26) [GCC 10.4.0]
python-bits: 64
OS: Linux
OS-release: 4.18.0-372.26.1.el8_6.x86_64
machine: x86_64
processor: x86_64
byteorder: little
LC_ALL: None
LANG: en_US.UTF-8
LOCALE: ('en_US', 'UTF-8')
libhdf5: 1.12.2
libnetcdf: 4.8.1
xarray: 2022.6.0
pandas: 1.4.4
numpy: 1.23.2
scipy: 1.9.1
netCDF4: 1.6.0
pydap: None
h5netcdf: None
h5py: 3.7.0
Nio: None
zarr: None
cftime: 1.6.1
nc_time_axis: None
PseudoNetCDF: None
rasterio: None
cfgrib: None
iris: None
bottleneck: None
dask: 2022.9.0
distributed: 2022.9.0
matplotlib: 3.5.3
cartopy: None
seaborn: 0.12.0
numbagg: None
fsspec: 2022.8.2
cupy: None
pint: None
sparse: None
flox: None
numpy_groupies: None
setuptools: 65.3.0
pip: 22.2.2
conda: None
pytest: 7.1.3
IPython: 7.33.0
sphinx: 5.1.1
|
{ "url": "https://api.github.com/repos/pydata/xarray/issues/7397/reactions", "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
13221727 | issue |