home / github

Menu
  • Search all tables
  • GraphQL API

issue_comments

Table actions
  • GraphQL API for issue_comments

4 rows where issue = 180080354 sorted by updated_at descending

✎ View and edit SQL

This data as json, CSV (advanced)

Suggested facets: created_at (date), updated_at (date)

user 3

  • shoyer 2
  • ktyle 1
  • meteoDaniel 1

author_association 2

  • MEMBER 2
  • NONE 2

issue 1

  • Memory error when converting dataset to dataframe · 4 ✖
id html_url issue_url node_id user created_at updated_at ▲ author_association body reactions performed_via_github_app issue
832136328 https://github.com/pydata/xarray/issues/1020#issuecomment-832136328 https://api.github.com/repos/pydata/xarray/issues/1020 MDEyOklzc3VlQ29tbWVudDgzMjEzNjMyOA== shoyer 1217238 2021-05-04T18:03:32Z 2021-05-04T18:03:32Z MEMBER

@meteoDaniel could you please open thread for discussing your issue? This could be a good use for the GitHub "Discussions" tab :)

Including a copy of the "repr" from printing your dataset would help us give more specific guidance.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Memory error when converting dataset to dataframe 180080354
832111396 https://github.com/pydata/xarray/issues/1020#issuecomment-832111396 https://api.github.com/repos/pydata/xarray/issues/1020 MDEyOklzc3VlQ29tbWVudDgzMjExMTM5Ng== meteoDaniel 27021858 2021-05-04T17:24:15Z 2021-05-04T17:24:15Z NONE

@shoyer I am having a similar problem. I am reading 80 files with total 8.3 GB . So each files has around 100 MB. If I understand you right: Using mf_dataset on such data is not recommend? So best practive wouold be to loop over the files ?

PS: I still tried to use some dask related operations but eachtime I try to access .values or use to_dataframe the memory usage explodes. Thanks a lot for answering ;)

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Memory error when converting dataset to dataframe 180080354
250777441 https://github.com/pydata/xarray/issues/1020#issuecomment-250777441 https://api.github.com/repos/pydata/xarray/issues/1020 MDEyOklzc3VlQ29tbWVudDI1MDc3NzQ0MQ== ktyle 1961038 2016-09-30T15:38:01Z 2016-09-30T15:38:01Z NONE

Good to know, and since the system I'm running on has 96 GB of RAM, I think your statement about pandas is correct too, as I also get the memory error when running on a smaller (18GB) dataset.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Memory error when converting dataset to dataframe 180080354
250615133 https://github.com/pydata/xarray/issues/1020#issuecomment-250615133 https://api.github.com/repos/pydata/xarray/issues/1020 MDEyOklzc3VlQ29tbWVudDI1MDYxNTEzMw== shoyer 1217238 2016-09-29T22:53:39Z 2016-09-29T22:53:39Z MEMBER

Looking at your dataset:

```

url ='http://nomads.ncep.noaa.gov:9090/dods/hrrr/hrrr20160801/hrrr_sfc_00z'

ds= xarray.open_dataset(url) /Users/shoyer/dev/xarray/xarray/conventions.py:386: RuntimeWarning: Unable to decode time axis into full numpy.datetime64 objects, continuing using dummy netCDF4.datetime objects instead, reason: dates out of range result = decode_cf_datetime(example_value, units, calendar)

ds <xarray.Dataset> Dimensions: (lat: 1155, lev: 5, lon: 2503, time: 19) Coordinates: * time (time) object 2016-09-28T12:00:00 2016-09-28T13:00:00 ... * lev (lev) float64 1e+03 925.0 850.0 700.0 500.0 * lat (lat) float64 21.14 21.17 21.2 21.22 21.25 21.28 21.3 ... * lon (lon) float64 -134.1 -134.1 -134.0 -134.0 -134.0 ... Data variables: dptprs (time, lev, lat, lon) float64 ... no4lftx180_0mb (time, lat, lon) float64 ... apcpsfc (time, lat, lon) float64 ... asnowsfc (time, lat, lon) float64 ... bgrunsfc (time, lat, lon) float64 ... capesfc (time, lat, lon) float64 ... cape180_0mb (time, lat, lon) float64 ... cape90_0mb (time, lat, lon) float64 ... cape255_0mb (time, lat, lon) float64 ... cfrzrsfc (time, lat, lon) float64 ... cicepsfc (time, lat, lon) float64 ... cinsfc (time, lat, lon) float64 ... cin180_0mb (time, lat, lon) float64 ... cin90_0mb (time, lat, lon) float64 ... cin255_0mb (time, lat, lon) float64 ... cnwatsfc (time, lat, lon) float64 ... cpofpsfc (time, lat, lon) float64 ... crainsfc (time, lat, lon) float64 ... csnowsfc (time, lat, lon) float64 ... dlwrfsfc (time, lat, lon) float64 ... dpt2m (time, lat, lon) float64 ... dswrfsfc (time, lat, lon) float64 ... dzdtsg500_800 (time, lat, lon) float64 ... fricvsfc (time, lat, lon) float64 ... frozrsfc (time, lat, lon) float64 ... gfluxsfc (time, lat, lon) float64 ... gustsfc (time, lat, lon) float64 ... hcdchcll (time, lat, lon) float64 ... hgtsfc (time, lat, lon) float64 ... hgt500mb (time, lat, lon) float64 ... hgt700mb (time, lat, lon) float64 ... hgt850mb (time, lat, lon) float64 ... hgt1000mb (time, lat, lon) float64 ... hgtclb (time, lat, lon) float64 ... hgt263_k (time, lat, lon) float64 ... hgt253_k (time, lat, lon) float64 ... hgttop0c (time, lat, lon) float64 ... hgtceil (time, lat, lon) float64 ... hgteql (time, lat, lon) float64 ... hgtclt (time, lat, lon) float64 ... hgt0c (time, lat, lon) float64 ... hgtl5 (time, lat, lon) float64 ... hlcy3000_0m (time, lat, lon) float64 ... hlcy1000_0m (time, lat, lon) float64 ... hpblsfc (time, lat, lon) float64 ... icecsfc (time, lat, lon) float64 ... landsfc (time, lat, lon) float64 ... lcdclcll (time, lat, lon) float64 ... lftxl100_100 (time, lat, lon) float64 ... lhtflsfc (time, lat, lon) float64 ... ltngclm (time, lat, lon) float64 ... maxdvv400_1000mb (time, lat, lon) float64 ... maxref1000m (time, lat, lon) float64 ... maxuvv400_1000mb (time, lat, lon) float64 ... mcdcmcll (time, lat, lon) float64 ... mslmamsl (time, lat, lon) float64 ... mstav0cm (time, lat, lon) float64 ... mxuphl5000_2000m (time, lat, lon) float64 ... plpl255_0mb (time, lat, lon) float64 ... pot2m (time, lat, lon) float64 ... pratesfc (time, lat, lon) float64 ... pressfc (time, lat, lon) float64 ... presclb (time, lat, lon) float64 ... prestop0c (time, lat, lon) float64 ... presclt (time, lat, lon) float64 ... pres0c (time, lat, lon) float64 ... pwatclm (time, lat, lon) float64 ... refcclm (time, lat, lon) float64 ... refd1000m (time, lat, lon) float64 ... refd4000m (time, lat, lon) float64 ... refd263_k (time, lat, lon) float64 ... retopclt (time, lat, lon) float64 ... rh2m (time, lat, lon) float64 ... rhtop0c (time, lat, lon) float64 ... rh0c (time, lat, lon) float64 ... rhpwclm (time, lat, lon) float64 ... sbt113toa (time, lat, lon) float64 ... sbt114toa (time, lat, lon) float64 ... sbt123toa (time, lat, lon) float64 ... sbt124toa (time, lat, lon) float64 ... sfcrsfc (time, lat, lon) float64 ... shtflsfc (time, lat, lon) float64 ... snodsfc (time, lat, lon) float64 ... snowcsfc (time, lat, lon) float64 ... spfh2m (time, lat, lon) float64 ... ssrunsfc (time, lat, lon) float64 ... tcdcclm (time, lat, lon) float64 ... tcolgclm (time, lat, lon) float64 ... tmpsfc (time, lat, lon) float64 ... tmpprs (time, lev, lat, lon) float64 ... tmp2m (time, lat, lon) float64 ... ugrdprs (time, lev, lat, lon) float64 ... ugrd80m (time, lat, lon) float64 ... ugrd10m (time, lat, lon) float64 ... ulwrfsfc (time, lat, lon) float64 ... ulwrftoa (time, lat, lon) float64 ... ustm0_6000m (time, lat, lon) float64 ... uswrfsfc (time, lat, lon) float64 ... vbdsfsfc (time, lat, lon) float64 ... vddsfsfc (time, lat, lon) float64 ... vgrdprs (time, lev, lat, lon) float64 ... vgrd80m (time, lat, lon) float64 ... vgrd10m (time, lat, lon) float64 ... vgtypsfc (time, lat, lon) float64 ... vilclm (time, lat, lon) float64 ... vissfc (time, lat, lon) float64 ... vstm0_6000m (time, lat, lon) float64 ... vucsh0_1000m (time, lat, lon) float64 ... vucsh0_6000m (time, lat, lon) float64 ... vvcsh0_1000m (time, lat, lon) float64 ... vvcsh0_6000m (time, lat, lon) float64 ... weasdaccsfc (time, lat, lon) float64 ... weasdsfc (time, lat, lon) float64 ... wind10m (time, lat, lon) float64 ... Attributes: title: High Resolution Rapid Refresh 3km 2D Surface forecast from 12Z28sep2016, downloaded Sep 28 13:19 UTC Conventions: COARDS GrADS dataType: Grid history: Thu Sep 29 17:52:17 UTC 2016 : imported by GrADS Data Server 2.0

ds.nbytes / 1e9 57.125497856 ```

So it's at least 57 GB when decoded as float64. This is probably more RAM than you have on your machine.

But also, when xarray writes a dataframe every variable first gets expanded to use all dimensions. So this is something like 5 * 57 GB in memory, and pandas probably needs a memory copy to create the DataFrame, so this probably needs at least 500 GB.

You'll have better luck subsetting the dataset first.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Memory error when converting dataset to dataframe 180080354

Advanced export

JSON shape: default, array, newline-delimited, object

CSV options:

CREATE TABLE [issue_comments] (
   [html_url] TEXT,
   [issue_url] TEXT,
   [id] INTEGER PRIMARY KEY,
   [node_id] TEXT,
   [user] INTEGER REFERENCES [users]([id]),
   [created_at] TEXT,
   [updated_at] TEXT,
   [author_association] TEXT,
   [body] TEXT,
   [reactions] TEXT,
   [performed_via_github_app] TEXT,
   [issue] INTEGER REFERENCES [issues]([id])
);
CREATE INDEX [idx_issue_comments_issue]
    ON [issue_comments] ([issue]);
CREATE INDEX [idx_issue_comments_user]
    ON [issue_comments] ([user]);
Powered by Datasette · Queries took 13.551ms · About: xarray-datasette