github: issue_comments: 5 rows where issue = 334633212 sorted by updated

5 rows where issue = 334633212 sorted by updated_at descending

Search:

descending

id	html_url	issue_url	node_id	user	created_at	updated_at ▲	author_association	body	reactions	issue
453866106	https://github.com/pydata/xarray/issues/2242#issuecomment-453866106	https://api.github.com/repos/pydata/xarray/issues/2242	MDEyOklzc3VlQ29tbWVudDQ1Mzg2NjEwNg==	jhamman 2443309	2019-01-13T21:13:28Z	2019-01-13T21:13:28Z	MEMBER	I just reran the example above and things seem to be resolved now. The write step for the two datasets is basically identical.	{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	to_netcdf(compute=False) can be slow 334633212
399503156	https://github.com/pydata/xarray/issues/2242#issuecomment-399503156	https://api.github.com/repos/pydata/xarray/issues/2242	MDEyOklzc3VlQ29tbWVudDM5OTUwMzE1Ng==	shoyer 1217238	2018-06-22T16:33:11Z	2018-06-22T16:33:11Z	MEMBER	This autoclose business is really hard to reason about in its current version, as part of the backend class. I'm hoping that refactoring it out into a separate object that we can use with composition instead of inheritance will help (e.g., alongside PickleByReconstructionWrapper).	{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	to_netcdf(compute=False) can be slow 334633212
399495668	https://github.com/pydata/xarray/issues/2242#issuecomment-399495668	https://api.github.com/repos/pydata/xarray/issues/2242	MDEyOklzc3VlQ29tbWVudDM5OTQ5NTY2OA==	neishm 1554921	2018-06-22T16:10:45Z	2018-06-22T16:10:45Z	CONTRIBUTOR	True, I would expect some performance hit due to writing chunk-by-chunk, however that same performance hit is present in both of the test cases. In addition to the snippet @shoyer mentioned, I found that xarray also intentionally uses `autoclose=True` when writing chunks to netCDF: https://github.com/pydata/xarray/blob/73b476e4db6631b2203954dd5b138cb650e4fb8c/xarray/backends/netCDF4_.py#L45-L48 However, `ensure_open` only uses `autoclose` if the file isn't already open: https://github.com/pydata/xarray/blob/73b476e4db6631b2203954dd5b138cb650e4fb8c/xarray/backends/common.py#L496-L503 So if the file is already open before getting to `BaseNetCDF4Array__setitem__`, it will remain open. If the file isn't yet opened, it will be opened, but then immediately closed after writing the chunk. I suspect this is what's happening in the delayed version - the starting state of `NetCDF4DataStore._isopen` is `False` for some reason, and so it is doomed to re-close itself for each chunk processed. If I remove the `autoclose=True` from `BaseNetCDF4Array__setitem__`, the file remains open and performance is comparable between the two tests.	{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	to_netcdf(compute=False) can be slow 334633212
399320127	https://github.com/pydata/xarray/issues/2242#issuecomment-399320127	https://api.github.com/repos/pydata/xarray/issues/2242	MDEyOklzc3VlQ29tbWVudDM5OTMyMDEyNw==	jhamman 2443309	2018-06-22T04:51:54Z	2018-06-22T04:51:54Z	MEMBER	I think, at least to some extent, the performance hit is to be expected. I don't think we should be opening the file more than once when using the serial or threaded schedulers so that may be a place where you can find some improvement. There will always be a performance hit when writing dask arrays to netcdf files chunk-by-chunk. For 1, there is a threading lock that limits parallel throughput. More importantly, the chunked writes are going to always be slower than larger reads coming directly from numpy arrays. In your example above, the snippit @shoyer mentions should evaluate to `autoclose=False`. However, the profiling you mention seems to indicate the opposite. Perhaps we should start by digging deeper on that point.	{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	to_netcdf(compute=False) can be slow 334633212
399275847	https://github.com/pydata/xarray/issues/2242#issuecomment-399275847	https://api.github.com/repos/pydata/xarray/issues/2242	MDEyOklzc3VlQ29tbWVudDM5OTI3NTg0Nw==	shoyer 1217238	2018-06-21T23:37:10Z	2018-06-21T23:37:10Z	MEMBER	I suspect this can be improved. Looking at the code, it appears that we only intentionally use `autoclose=True` for writes when using multiprocessing or the distributed dask scheduler. https://github.com/pydata/xarray/blob/73b476e4db6631b2203954dd5b138cb650e4fb8c/xarray/backends/api.py#L709-L710	{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	to_netcdf(compute=False) can be slow 334633212

Advanced export

JSON shape: default, array, newline-delimited, object

CREATE TABLE [issue_comments] (
   [html_url] TEXT,
   [issue_url] TEXT,
   [id] INTEGER PRIMARY KEY,
   [node_id] TEXT,
   [user] INTEGER REFERENCES [users]([id]),
   [created_at] TEXT,
   [updated_at] TEXT,
   [author_association] TEXT,
   [body] TEXT,
   [reactions] TEXT,
   [performed_via_github_app] TEXT,
   [issue] INTEGER REFERENCES [issues]([id])
);
CREATE INDEX [idx_issue_comments_issue]
    ON [issue_comments] ([issue]);
CREATE INDEX [idx_issue_comments_user]
    ON [issue_comments] ([user]);