home / github / issue_comments

Menu
  • GraphQL API
  • Search all tables

issue_comments: 1362507511

This data as json

html_url issue_url id node_id user created_at updated_at author_association body reactions performed_via_github_app issue
https://github.com/pydata/xarray/issues/7397#issuecomment-1362507511 https://api.github.com/repos/pydata/xarray/issues/7397 1362507511 IC_kwDOAMm_X85RNjb3 5821660 2022-12-22T07:33:39Z 2022-12-22T07:33:39Z MEMBER

IIUC the amount of memory is quite what the dimensions suggest (assuming 4byte dtype):

(280 * 200 * 277 * 754 * 4 bytes) / 1024³ = 43.57 GB

I'm not that familiar with the data flow in to_netcdf but it's clear that the whole data is read into memory for some reason. The error happens at backend level, so assuming engine=netcdf4. You might try with engine="h5netcdf" or consider @TomNicholas suggestion of using to_zarr to possibly get the backends out of the equation.

Some questions @benoitespinola :

Can you show the repr's of the single file Dataset's and the repr of the combined? Are your final data variables of that size (time: 280, depth: 200, lat: 277, lon: 754)? Did you do some processing with the data, changing attributes/encoding etc? Is it possible to create your source data files from scratch with random data? An MCVE showing that would help.

Further suggestions:

If you have multiple data variables, drop all but one prior to saving. Is the behaviour consistent for each of your variables? Try to be explicit in the call to open_mfdataset (eg. adding keyword chunks etc.). Try to open individual files and use xr.merge/xr.concat.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  1506437087
Powered by Datasette · Queries took 2.794ms · About: xarray-datasette