github: issue_comments: 6 rows where author_association = "MEMBER" and issue = 435535284 sorted by updated

6 rows where author_association = "MEMBER" and issue = 435535284 sorted by updated_at descending

Search:

descending

id	html_url	issue_url	node_id	user	created_at	updated_at ▲	author_association	body	reactions	issue
534869060	https://github.com/pydata/xarray/issues/2912#issuecomment-534869060	https://api.github.com/repos/pydata/xarray/issues/2912	MDEyOklzc3VlQ29tbWVudDUzNDg2OTA2MA==	shoyer 1217238	2019-09-25T06:08:43Z	2019-09-25T06:08:43Z	MEMBER	I suspect it could work pretty well to explicitly rechunk your dataset into larger chunks (e.g., with the `Dataset.chunk()` method). This way you could continue to use dask for lazy writes, but reduce the overhead of writing individual chunks.	{ "total_count": 1, "+1": 1, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	Writing a netCDF file is unexpectedly slow 435535284
534855337	https://github.com/pydata/xarray/issues/2912#issuecomment-534855337	https://api.github.com/repos/pydata/xarray/issues/2912	MDEyOklzc3VlQ29tbWVudDUzNDg1NTMzNw==	jhamman 2443309	2019-09-25T05:12:32Z	2019-09-25T05:12:32Z	MEMBER	@fsteinmetz - in my experience, the main thing to consider here is how and when xarray's backends lock/block for certain operations. The hdf5 library is not thread safe and so we implement a global lock around all hdf5 read/write operations. In most cases, this means we can only do one read or one write at a time per process. We have found that using Dask's distributed (or mulitprocessing) scheduler allows us to bypass the thread locks required by hdf5 by using multiple processes. We also need a per file lock when writing, so using multiple output datasets theoretically allows for concurrent writes (provided your filesystem and OS support this). Finally, its best not to jump to the complicated explanations first. If you have many small dask chunks in your dataset, both reading and writing will be quite inefficient. This is simply because there is some non-trivial overhead when accessing partial datasets. This is even worse when the dataset is chunked/compressed. Hope that helps.	{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	Writing a netCDF file is unexpectedly slow 435535284
485497398	https://github.com/pydata/xarray/issues/2912#issuecomment-485497398	https://api.github.com/repos/pydata/xarray/issues/2912	MDEyOklzc3VlQ29tbWVudDQ4NTQ5NzM5OA==	jhamman 2443309	2019-04-22T18:06:56Z	2019-04-22T18:06:56Z	MEMBER	Since the final dataset size is quite manageable, I would start by forcing computation before the write step: `python ncdat.load().to_netcdf(...)` While writing of xarray datasets backed by dask is possible, its a poorly optimized operation. Most of this comes from constraints in netCDF4/HDF5. There are ways to side step some of these challenges (`save_mfdataset` and the distributed dask scheduler) but they are probably overkill for this use case.	{ "total_count": 2, "+1": 2, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	Writing a netCDF file is unexpectedly slow 435535284
485465687	https://github.com/pydata/xarray/issues/2912#issuecomment-485465687	https://api.github.com/repos/pydata/xarray/issues/2912	MDEyOklzc3VlQ29tbWVudDQ4NTQ2NTY4Nw==	shoyer 1217238	2019-04-22T16:23:44Z	2019-04-22T16:23:44Z	MEMBER	It really depends on the underlying cause. In most cases, writing a file to disk is not the slow part, only the place where the slow-down is manifested.	{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	Writing a netCDF file is unexpectedly slow 435535284
485464872	https://github.com/pydata/xarray/issues/2912#issuecomment-485464872	https://api.github.com/repos/pydata/xarray/issues/2912	MDEyOklzc3VlQ29tbWVudDQ4NTQ2NDg3Mg==	dcherian 2448579	2019-04-22T16:21:00Z	2019-04-22T16:21:20Z	MEMBER	Are there "best practices" for a situation like this? Parallel writes? `save_mfdataset`? ping @jhamman @rabernat	{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	Writing a netCDF file is unexpectedly slow 435535284
485460901	https://github.com/pydata/xarray/issues/2912#issuecomment-485460901	https://api.github.com/repos/pydata/xarray/issues/2912	MDEyOklzc3VlQ29tbWVudDQ4NTQ2MDkwMQ==	shoyer 1217238	2019-04-22T16:06:50Z	2019-04-22T16:06:50Z	MEMBER	You're using dask, so the Dataset is being lazily computed. If one part of your pipeline is very expensive (perhaps reading the original data from disk?) then the process of saving can be very slow. I would suggest doing some profiling, e.g., as shown in this example: http://docs.dask.org/en/latest/diagnostics-local.html#example Once we know what the slow part is, that will hopefully make opportunities for improvement more obvious.	{ "total_count": 1, "+1": 1, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	Writing a netCDF file is unexpectedly slow 435535284

Advanced export

JSON shape: default, array, newline-delimited, object

CREATE TABLE [issue_comments] (
   [html_url] TEXT,
   [issue_url] TEXT,
   [id] INTEGER PRIMARY KEY,
   [node_id] TEXT,
   [user] INTEGER REFERENCES [users]([id]),
   [created_at] TEXT,
   [updated_at] TEXT,
   [author_association] TEXT,
   [body] TEXT,
   [reactions] TEXT,
   [performed_via_github_app] TEXT,
   [issue] INTEGER REFERENCES [issues]([id])
);
CREATE INDEX [idx_issue_comments_issue]
    ON [issue_comments] ([issue]);
CREATE INDEX [idx_issue_comments_user]
    ON [issue_comments] ([user]);