home / github

Menu
  • GraphQL API
  • Search all tables

issue_comments

Table actions
  • GraphQL API for issue_comments

2 rows where author_association = "MEMBER" and issue = 1506437087 sorted by updated_at descending

✎ View and edit SQL

This data as json, CSV (advanced)

Suggested facets: created_at (date), updated_at (date)

user 2

  • kmuehlbauer 1
  • TomNicholas 1

issue 1

  • Memory issue merging NetCDF files using xarray.open_mfdataset and to_netcdf · 2 ✖

author_association 1

  • MEMBER · 2 ✖
id html_url issue_url node_id user created_at updated_at ▲ author_association body reactions performed_via_github_app issue
1362507511 https://github.com/pydata/xarray/issues/7397#issuecomment-1362507511 https://api.github.com/repos/pydata/xarray/issues/7397 IC_kwDOAMm_X85RNjb3 kmuehlbauer 5821660 2022-12-22T07:33:39Z 2022-12-22T07:33:39Z MEMBER

IIUC the amount of memory is quite what the dimensions suggest (assuming 4byte dtype):

(280 * 200 * 277 * 754 * 4 bytes) / 1024³ = 43.57 GB

I'm not that familiar with the data flow in to_netcdf but it's clear that the whole data is read into memory for some reason. The error happens at backend level, so assuming engine=netcdf4. You might try with engine="h5netcdf" or consider @TomNicholas suggestion of using to_zarr to possibly get the backends out of the equation.

Some questions @benoitespinola :

Can you show the repr's of the single file Dataset's and the repr of the combined? Are your final data variables of that size (time: 280, depth: 200, lat: 277, lon: 754)? Did you do some processing with the data, changing attributes/encoding etc? Is it possible to create your source data files from scratch with random data? An MCVE showing that would help.

Further suggestions:

If you have multiple data variables, drop all but one prior to saving. Is the behaviour consistent for each of your variables? Try to be explicit in the call to open_mfdataset (eg. adding keyword chunks etc.). Try to open individual files and use xr.merge/xr.concat.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Memory issue merging NetCDF files using xarray.open_mfdataset and to_netcdf 1506437087
1362271360 https://github.com/pydata/xarray/issues/7397#issuecomment-1362271360 https://api.github.com/repos/pydata/xarray/issues/7397 IC_kwDOAMm_X85RMpyA TomNicholas 35968931 2022-12-22T01:04:39Z 2022-12-22T01:04:39Z MEMBER

Thanks for this bug report. FWIW I have also seen this bug recently when helping out a student.

The question here is whether this is an xarray, numpy, or a netcdf bug (or some combo). Can you reproduce the problem using to_zarr()? If so that would rule out netcdf as the culprit.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Memory issue merging NetCDF files using xarray.open_mfdataset and to_netcdf 1506437087

Advanced export

JSON shape: default, array, newline-delimited, object

CSV options:

CREATE TABLE [issue_comments] (
   [html_url] TEXT,
   [issue_url] TEXT,
   [id] INTEGER PRIMARY KEY,
   [node_id] TEXT,
   [user] INTEGER REFERENCES [users]([id]),
   [created_at] TEXT,
   [updated_at] TEXT,
   [author_association] TEXT,
   [body] TEXT,
   [reactions] TEXT,
   [performed_via_github_app] TEXT,
   [issue] INTEGER REFERENCES [issues]([id])
);
CREATE INDEX [idx_issue_comments_issue]
    ON [issue_comments] ([issue]);
CREATE INDEX [idx_issue_comments_user]
    ON [issue_comments] ([user]);
Powered by Datasette · Queries took 486.769ms · About: xarray-datasette