github: issue_comments: 6 rows where issue = 372848074 and user = 1312546 sorted by updated

6 rows where issue = 372848074 and user = 1312546 sorted by updated_at descending

Search:

descending

id	html_url	issue_url	node_id	user	created_at	updated_at ▲	author_association	body	reactions	issue
510217080	https://github.com/pydata/xarray/issues/2501#issuecomment-510217080	https://api.github.com/repos/pydata/xarray/issues/2501	MDEyOklzc3VlQ29tbWVudDUxMDIxNzA4MA==	TomAugspurger 1312546	2019-07-10T20:30:41Z	2019-07-10T20:30:41Z	MEMBER	Yep, that’s my suspicion as well. I’m still plugging away at it. Currently the pausing logic isn’t quite working well. On Jul 10, 2019, at 12:10, Ryan Abernathey notifications@github.com wrote: I believe that the memory issue is basically the same as dask/distributed#2602. The graphs look like: read --> rechunk --> write. Reading and rechunking increase memory consumption. Writing relieves it. In Rich's case, the workers just load too much data before they write it. Eventually they run out of memory. — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub, or mute the thread.	{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	open_mfdataset usage and limitations. 372848074
510167911	https://github.com/pydata/xarray/issues/2501#issuecomment-510167911	https://api.github.com/repos/pydata/xarray/issues/2501	MDEyOklzc3VlQ29tbWVudDUxMDE2NzkxMQ==	TomAugspurger 1312546	2019-07-10T18:05:07Z	2019-07-10T18:05:07Z	MEMBER	Great, thanks. I’ll look into the memory issue when writing. We may already have an issue for it. On Jul 10, 2019, at 10:59, Rich Signell notifications@github.com wrote: @TomAugspurger , I sat down here at Scipy with @rabernat and he instantly realized that we needed to drop the feature_id coordinate to prevent open_mfdataset from trying to harmonize that coordinate from all the chunks. So if I use this code, the open_mdfdataset command finishes: def drop_coords(ds): ds = ds.drop(['reference_time','feature_id']) return ds.reset_coords(drop=True) and I can then add back in the dropped coordinate values at the end: dsets = [xr.open_dataset(f) for f in files[:3]] ds.coords['feature_id'] = dsets[0].coords['feature_id'] I'm now running into memory issues when I write the zarr data -- but I should raise that as a new issue, right? — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub, or mute the thread.	{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	open_mfdataset usage and limitations. 372848074
509346055	https://github.com/pydata/xarray/issues/2501#issuecomment-509346055	https://api.github.com/repos/pydata/xarray/issues/2501	MDEyOklzc3VlQ29tbWVudDUwOTM0NjA1NQ==	TomAugspurger 1312546	2019-07-08T18:46:58Z	2019-07-08T18:46:58Z	MEMBER	@rsignell-usgs very helpful, thanks. I'd noticed that there was a pause after the open_dataset tasks finish, indicating that either the scheduler or (more likely) the client was doing work rather than the cluster. Most likely @rabernat's guess In open_mfdataset, all of the dimensions and coordinates of the individual files have to be checked and verified to be compatible. That is often the source of slow performance with open_mfdataset. is correct. Verifying all that now, and looking into if / how that can be done on the workers.	{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	open_mfdataset usage and limitations. 372848074
509307081	https://github.com/pydata/xarray/issues/2501#issuecomment-509307081	https://api.github.com/repos/pydata/xarray/issues/2501	MDEyOklzc3VlQ29tbWVudDUwOTMwNzA4MQ==	TomAugspurger 1312546	2019-07-08T16:57:15Z	2019-07-08T16:57:15Z	MEMBER	I'm looking into it today. Can you clarify The memory use kept growing until the process died. by "process" do you mean a dask worker process, or just the main python process executing the `ds = xr.open_mfdataset(...)` code?	{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	open_mfdataset usage and limitations. 372848074
506497180	https://github.com/pydata/xarray/issues/2501#issuecomment-506497180	https://api.github.com/repos/pydata/xarray/issues/2501	MDEyOklzc3VlQ29tbWVudDUwNjQ5NzE4MA==	TomAugspurger 1312546	2019-06-27T20:24:26Z	2019-06-27T20:24:26Z	MEMBER	The datasets in our cloud datastore are designed explicitly to avoid this problem! Good to know! FYI, https://github.com/pydata/xarray/issues/2501#issuecomment-506478508 was user error (I can access it, but need to specify the us-east-1 region). Taking a look now.	{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	open_mfdataset usage and limitations. 372848074
506478508	https://github.com/pydata/xarray/issues/2501#issuecomment-506478508	https://api.github.com/repos/pydata/xarray/issues/2501	MDEyOklzc3VlQ29tbWVudDUwNjQ3ODUwOA==	TomAugspurger 1312546	2019-06-27T19:25:05Z	2019-06-27T19:25:05Z	MEMBER	Thanks, will take a look this afternoon. Are there any datasets on https://pangeo-data.github.io/pangeo-datastore/ that would exhibit this poor behavior? I may not have access to the bucket (or I'm misusing `rclone`) `2019/06/27 14:23:50 NOTICE: Config file "/Users/taugspurger/.config/rclone/rclone.conf" not found - using defaults 2019/06/27 14:23:50 Failed to create file system for "aws-east:nwm-archive/2009": didn't find section in config file`	{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	open_mfdataset usage and limitations. 372848074

Advanced export

JSON shape: default, array, newline-delimited, object

CREATE TABLE [issue_comments] (
   [html_url] TEXT,
   [issue_url] TEXT,
   [id] INTEGER PRIMARY KEY,
   [node_id] TEXT,
   [user] INTEGER REFERENCES [users]([id]),
   [created_at] TEXT,
   [updated_at] TEXT,
   [author_association] TEXT,
   [body] TEXT,
   [reactions] TEXT,
   [performed_via_github_app] TEXT,
   [issue] INTEGER REFERENCES [issues]([id])
);
CREATE INDEX [idx_issue_comments_issue]
    ON [issue_comments] ([issue]);
CREATE INDEX [idx_issue_comments_user]
    ON [issue_comments] ([user]);