home / github

Menu
  • Search all tables
  • GraphQL API

issue_comments

Table actions
  • GraphQL API for issue_comments

6 rows where issue = 372848074 and user = 1312546 sorted by updated_at descending

✎ View and edit SQL

This data as json, CSV (advanced)

Suggested facets: created_at (date), updated_at (date)

user 1

  • TomAugspurger · 6 ✖

issue 1

  • open_mfdataset usage and limitations. · 6 ✖

author_association 1

  • MEMBER 6
id html_url issue_url node_id user created_at updated_at ▲ author_association body reactions performed_via_github_app issue
510217080 https://github.com/pydata/xarray/issues/2501#issuecomment-510217080 https://api.github.com/repos/pydata/xarray/issues/2501 MDEyOklzc3VlQ29tbWVudDUxMDIxNzA4MA== TomAugspurger 1312546 2019-07-10T20:30:41Z 2019-07-10T20:30:41Z MEMBER

Yep, that’s my suspicion as well. I’m still plugging away at it. Currently the pausing logic isn’t quite working well.

On Jul 10, 2019, at 12:10, Ryan Abernathey notifications@github.com wrote:

I believe that the memory issue is basically the same as dask/distributed#2602.

The graphs look like: read --> rechunk --> write.

Reading and rechunking increase memory consumption. Writing relieves it. In Rich's case, the workers just load too much data before they write it. Eventually they run out of memory.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub, or mute the thread.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  open_mfdataset usage and limitations. 372848074
510167911 https://github.com/pydata/xarray/issues/2501#issuecomment-510167911 https://api.github.com/repos/pydata/xarray/issues/2501 MDEyOklzc3VlQ29tbWVudDUxMDE2NzkxMQ== TomAugspurger 1312546 2019-07-10T18:05:07Z 2019-07-10T18:05:07Z MEMBER

Great, thanks. I’ll look into the memory issue when writing. We may already have an issue for it.

On Jul 10, 2019, at 10:59, Rich Signell notifications@github.com wrote:

@TomAugspurger , I sat down here at Scipy with @rabernat and he instantly realized that we needed to drop the feature_id coordinate to prevent open_mfdataset from trying to harmonize that coordinate from all the chunks.

So if I use this code, the open_mdfdataset command finishes:

def drop_coords(ds): ds = ds.drop(['reference_time','feature_id']) return ds.reset_coords(drop=True) and I can then add back in the dropped coordinate values at the end:

dsets = [xr.open_dataset(f) for f in files[:3]] ds.coords['feature_id'] = dsets[0].coords['feature_id'] I'm now running into memory issues when I write the zarr data -- but I should raise that as a new issue, right?

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub, or mute the thread.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  open_mfdataset usage and limitations. 372848074
509346055 https://github.com/pydata/xarray/issues/2501#issuecomment-509346055 https://api.github.com/repos/pydata/xarray/issues/2501 MDEyOklzc3VlQ29tbWVudDUwOTM0NjA1NQ== TomAugspurger 1312546 2019-07-08T18:46:58Z 2019-07-08T18:46:58Z MEMBER

@rsignell-usgs very helpful, thanks. I'd noticed that there was a pause after the open_dataset tasks finish, indicating that either the scheduler or (more likely) the client was doing work rather than the cluster. Most likely @rabernat's guess

In open_mfdataset, all of the dimensions and coordinates of the individual files have to be checked and verified to be compatible. That is often the source of slow performance with open_mfdataset.

is correct. Verifying all that now, and looking into if / how that can be done on the workers.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  open_mfdataset usage and limitations. 372848074
509307081 https://github.com/pydata/xarray/issues/2501#issuecomment-509307081 https://api.github.com/repos/pydata/xarray/issues/2501 MDEyOklzc3VlQ29tbWVudDUwOTMwNzA4MQ== TomAugspurger 1312546 2019-07-08T16:57:15Z 2019-07-08T16:57:15Z MEMBER

I'm looking into it today. Can you clarify

The memory use kept growing until the process died.

by "process" do you mean a dask worker process, or just the main python process executing the ds = xr.open_mfdataset(...) code?

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  open_mfdataset usage and limitations. 372848074
506497180 https://github.com/pydata/xarray/issues/2501#issuecomment-506497180 https://api.github.com/repos/pydata/xarray/issues/2501 MDEyOklzc3VlQ29tbWVudDUwNjQ5NzE4MA== TomAugspurger 1312546 2019-06-27T20:24:26Z 2019-06-27T20:24:26Z MEMBER

The datasets in our cloud datastore are designed explicitly to avoid this problem!

Good to know!

FYI, https://github.com/pydata/xarray/issues/2501#issuecomment-506478508 was user error (I can access it, but need to specify the us-east-1 region). Taking a look now.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  open_mfdataset usage and limitations. 372848074
506478508 https://github.com/pydata/xarray/issues/2501#issuecomment-506478508 https://api.github.com/repos/pydata/xarray/issues/2501 MDEyOklzc3VlQ29tbWVudDUwNjQ3ODUwOA== TomAugspurger 1312546 2019-06-27T19:25:05Z 2019-06-27T19:25:05Z MEMBER

Thanks, will take a look this afternoon. Are there any datasets on https://pangeo-data.github.io/pangeo-datastore/ that would exhibit this poor behavior? I may not have access to the bucket (or I'm misusing rclone)

2019/06/27 14:23:50 NOTICE: Config file "/Users/taugspurger/.config/rclone/rclone.conf" not found - using defaults 2019/06/27 14:23:50 Failed to create file system for "aws-east:nwm-archive/2009": didn't find section in config file

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  open_mfdataset usage and limitations. 372848074

Advanced export

JSON shape: default, array, newline-delimited, object

CSV options:

CREATE TABLE [issue_comments] (
   [html_url] TEXT,
   [issue_url] TEXT,
   [id] INTEGER PRIMARY KEY,
   [node_id] TEXT,
   [user] INTEGER REFERENCES [users]([id]),
   [created_at] TEXT,
   [updated_at] TEXT,
   [author_association] TEXT,
   [body] TEXT,
   [reactions] TEXT,
   [performed_via_github_app] TEXT,
   [issue] INTEGER REFERENCES [issues]([id])
);
CREATE INDEX [idx_issue_comments_issue]
    ON [issue_comments] ([issue]);
CREATE INDEX [idx_issue_comments_user]
    ON [issue_comments] ([user]);
Powered by Datasette · Queries took 15.894ms · About: xarray-datasette