html_url,issue_url,id,node_id,user,created_at,updated_at,author_association,body,reactions,performed_via_github_app,issue
https://github.com/pydata/xarray/issues/7397#issuecomment-1362507511,https://api.github.com/repos/pydata/xarray/issues/7397,1362507511,IC_kwDOAMm_X85RNjb3,5821660,2022-12-22T07:33:39Z,2022-12-22T07:33:39Z,MEMBER,"IIUC the amount of memory is quite what the dimensions suggest (assuming 4byte dtype):
(280 * 200 * 277 * 754 * 4 bytes) / 1024³ = 43.57 GB
I'm not that familiar with the data flow in `to_netcdf` but it's clear that the whole data is read into memory for some reason. The error happens at backend level, so assuming engine=`netcdf4`. You might try with `engine=""h5netcdf""` or consider @TomNicholas suggestion of using `to_zarr` to possibly get the backends out of the equation.
Some questions @benoitespinola :
Can you show the repr's of the single file Dataset's and the repr of the combined?
Are your final data variables of that size (time: 280, depth: 200, lat: 277, lon: 754)?
Did you do some processing with the data, changing attributes/encoding etc?
Is it possible to create your source data files from scratch with random data? An MCVE showing that would help.
Further suggestions:
If you have multiple data variables, drop all but one prior to saving. Is the behaviour consistent for each of your variables?
Try to be explicit in the call to `open_mfdataset` (eg. adding keyword `chunks` etc.).
Try to open individual files and use `xr.merge`/`xr.concat`.","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,1506437087
https://github.com/pydata/xarray/issues/7397#issuecomment-1362271360,https://api.github.com/repos/pydata/xarray/issues/7397,1362271360,IC_kwDOAMm_X85RMpyA,35968931,2022-12-22T01:04:39Z,2022-12-22T01:04:39Z,MEMBER,"Thanks for this bug report. FWIW I have also seen this bug recently when helping out a student.
The question here is whether this is an xarray, numpy, or a netcdf bug (or some combo). Can you reproduce the problem using `to_zarr()`? If so that would rule out netcdf as the culprit.","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,1506437087