home / github

Menu
  • GraphQL API
  • Search all tables

issue_comments

Table actions
  • GraphQL API for issue_comments

6 rows where issue = 372848074 and user = 1197350 sorted by updated_at descending

✎ View and edit SQL

This data as json, CSV (advanced)

Suggested facets: updated_at (date)

user 1

  • rabernat · 6 ✖

issue 1

  • open_mfdataset usage and limitations. · 6 ✖

author_association 1

  • MEMBER 6
id html_url issue_url node_id user created_at updated_at ▲ author_association body reactions performed_via_github_app issue
510169853 https://github.com/pydata/xarray/issues/2501#issuecomment-510169853 https://api.github.com/repos/pydata/xarray/issues/2501 MDEyOklzc3VlQ29tbWVudDUxMDE2OTg1Mw== rabernat 1197350 2019-07-10T18:10:37Z 2019-07-10T18:10:37Z MEMBER

I believe that the memory issue is basically the same as https://github.com/dask/distributed/issues/2602.

The graphs look like: read --> rechunk --> write.

Reading and rechunking increase memory consumption. Writing relieves it. In Rich's case, the workers just load too much data before they write it. Eventually they run out of memory.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  open_mfdataset usage and limitations. 372848074
506482057 https://github.com/pydata/xarray/issues/2501#issuecomment-506482057 https://api.github.com/repos/pydata/xarray/issues/2501 MDEyOklzc3VlQ29tbWVudDUwNjQ4MjA1Nw== rabernat 1197350 2019-06-27T19:36:51Z 2019-06-27T19:36:51Z MEMBER

@rsignell-usgs

Can you post the xarray repr of two sample files post pre-processing function?

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  open_mfdataset usage and limitations. 372848074
506481845 https://github.com/pydata/xarray/issues/2501#issuecomment-506481845 https://api.github.com/repos/pydata/xarray/issues/2501 MDEyOklzc3VlQ29tbWVudDUwNjQ4MTg0NQ== rabernat 1197350 2019-06-27T19:36:11Z 2019-06-27T19:36:11Z MEMBER

Are there any datasets on https://pangeo-data.github.io/pangeo-datastore/ that would exhibit this poor behavior?

The datasets in our cloud datastore are designed explicitly to avoid this problem!

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  open_mfdataset usage and limitations. 372848074
503641038 https://github.com/pydata/xarray/issues/2501#issuecomment-503641038 https://api.github.com/repos/pydata/xarray/issues/2501 MDEyOklzc3VlQ29tbWVudDUwMzY0MTAzOA== rabernat 1197350 2019-06-19T16:48:29Z 2019-06-19T16:48:29Z MEMBER

Try writing a preprocessor function that drops all coordinates python def drop_coords(ds): return ds.reset_coords(drop=True)

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  open_mfdataset usage and limitations. 372848074
432342306 https://github.com/pydata/xarray/issues/2501#issuecomment-432342306 https://api.github.com/repos/pydata/xarray/issues/2501 MDEyOklzc3VlQ29tbWVudDQzMjM0MjMwNg== rabernat 1197350 2018-10-23T17:27:50Z 2018-10-23T17:27:50Z MEMBER

^ I'm assuming you're in a notebook. If not, call print instead of display.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  open_mfdataset usage and limitations. 372848074
432342180 https://github.com/pydata/xarray/issues/2501#issuecomment-432342180 https://api.github.com/repos/pydata/xarray/issues/2501 MDEyOklzc3VlQ29tbWVudDQzMjM0MjE4MA== rabernat 1197350 2018-10-23T17:27:30Z 2018-10-23T17:27:30Z MEMBER

In open_mfdataset, all of the dimensions and coordinates of the individual files have to be checked and verified to be compatible. That is often the source of slow performance with open_mfdataset.

To help us help you debug, please provide more information about the files your are opening. Specifically, please call open_dataset() directly on the first two files and copy and paste the output here. Specifically, do something like this python from glob import glob import xarray as xr all_files = glob('*1002*.nc') display(xr.open_dataset(all_files[0])) display(xr.open_dataset(all_files[1]))

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  open_mfdataset usage and limitations. 372848074

Advanced export

JSON shape: default, array, newline-delimited, object

CSV options:

CREATE TABLE [issue_comments] (
   [html_url] TEXT,
   [issue_url] TEXT,
   [id] INTEGER PRIMARY KEY,
   [node_id] TEXT,
   [user] INTEGER REFERENCES [users]([id]),
   [created_at] TEXT,
   [updated_at] TEXT,
   [author_association] TEXT,
   [body] TEXT,
   [reactions] TEXT,
   [performed_via_github_app] TEXT,
   [issue] INTEGER REFERENCES [issues]([id])
);
CREATE INDEX [idx_issue_comments_issue]
    ON [issue_comments] ([issue]);
CREATE INDEX [idx_issue_comments_user]
    ON [issue_comments] ([user]);
Powered by Datasette · Queries took 962.648ms · About: xarray-datasette