github: issue_comments: 9 rows where issue = 712189206 sorted by updated

9 rows where issue = 712189206 sorted by updated_at descending

Search:

descending

id	html_url	issue_url	node_id	user	created_at	updated_at ▲	author_association	body	reactions	issue
702307334	https://github.com/pydata/xarray/issues/4475#issuecomment-702307334	https://api.github.com/repos/pydata/xarray/issues/4475	MDEyOklzc3VlQ29tbWVudDcwMjMwNzMzNA==	heerad 2560426	2020-10-01T18:07:55Z	2020-10-01T18:07:55Z	NONE	Sounds good, I'll do this in the meantime. Still quite interested in `save_mfdataset` dealing with these lower level details, if possible. The ideal case would be loading with `load_mfdataset`, defining some ops lazily, then piping that directly to `save_mfdataset`.	{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	Preprocess function for save_mfdataset 712189206
702276824	https://github.com/pydata/xarray/issues/4475#issuecomment-702276824	https://api.github.com/repos/pydata/xarray/issues/4475	MDEyOklzc3VlQ29tbWVudDcwMjI3NjgyNA==	dcherian 2448579	2020-10-01T17:13:16Z	2020-10-01T17:13:16Z	MEMBER	doesn't that then require me to load everything into memory before writing? I think so. I would try multiple processes and see if that is fast enough for what you want to do. Or else, write to zarr. This will be parallelized and is a lot easier than dealing with HDF5	{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	Preprocess function for save_mfdataset 712189206
702265883	https://github.com/pydata/xarray/issues/4475#issuecomment-702265883	https://api.github.com/repos/pydata/xarray/issues/4475	MDEyOklzc3VlQ29tbWVudDcwMjI2NTg4Mw==	heerad 2560426	2020-10-01T16:52:59Z	2020-10-01T16:52:59Z	NONE	Multiple threads (the default), because it's recommended "for numeric code that releases the GIL (like NumPy, Pandas, Scikit-Learn, Numba, …)" according to the dask docs. I guess I could do multi-threaded for the compute part (everything up to the definition of `ds`), then multi-process for the write part, but doesn't that then require me to load everything into memory before writing?	{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	Preprocess function for save_mfdataset 712189206
702226256	https://github.com/pydata/xarray/issues/4475#issuecomment-702226256	https://api.github.com/repos/pydata/xarray/issues/4475	MDEyOklzc3VlQ29tbWVudDcwMjIyNjI1Ng==	dcherian 2448579	2020-10-01T15:46:45Z	2020-10-01T15:46:45Z	MEMBER	Are you using multiple threads or multiple processes? IIUC you should be using multiple processes for max writing efficiency.	{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	Preprocess function for save_mfdataset 712189206
702178407	https://github.com/pydata/xarray/issues/4475#issuecomment-702178407	https://api.github.com/repos/pydata/xarray/issues/4475	MDEyOklzc3VlQ29tbWVudDcwMjE3ODQwNw==	heerad 2560426	2020-10-01T14:34:28Z	2020-10-01T14:34:28Z	NONE	Thank you, this works for me. However, it's quite slow and seems to scale faster than linearly as the length of `datasets` increases (the number of groups in the `groupby`). Could it be connected to https://github.com/pydata/xarray/issues/2912#issuecomment-485497398 where they suggest to use `save_mfdataset` instead of `to_netcdf`? If so, there's a stronger case for supporting delayed objects in `save_mfdataset` as you said. Appreciate the help!	{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	Preprocess function for save_mfdataset 712189206
701694586	https://github.com/pydata/xarray/issues/4475#issuecomment-701694586	https://api.github.com/repos/pydata/xarray/issues/4475	MDEyOklzc3VlQ29tbWVudDcwMTY5NDU4Ng==	shoyer 1217238	2020-09-30T23:13:33Z	2020-09-30T23:13:33Z	MEMBER	I think we could support delayed objects in `save_mfdataset`, at least in principle. But if you're OK using delayed objects, you might as well write each netCDF file separately using `dask.delayed`, e.g., ``` def write_dataset(dataset, path): your_function(ds).to_netcdf(path) result = [dask.delayed(write_dataset)(ds, path) for ds, path in zip(datasets, path)] dask.compute(result) ```	{ "total_count": 3, "+1": 3, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	Preprocess function for save_mfdataset 712189206
701688956	https://github.com/pydata/xarray/issues/4475#issuecomment-701688956	https://api.github.com/repos/pydata/xarray/issues/4475	MDEyOklzc3VlQ29tbWVudDcwMTY4ODk1Ng==	dcherian 2448579	2020-09-30T22:55:28Z	2020-09-30T22:55:28Z	MEMBER	You could write to netCDF in `your_function` and avoid `save_mfdataset` altogether... I guess this is a good argument for adding a `preprocess` kwarg.	{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	Preprocess function for save_mfdataset 712189206
701676076	https://github.com/pydata/xarray/issues/4475#issuecomment-701676076	https://api.github.com/repos/pydata/xarray/issues/4475	MDEyOklzc3VlQ29tbWVudDcwMTY3NjA3Ng==	heerad 2560426	2020-09-30T22:17:24Z	2020-09-30T22:17:24Z	NONE	Unfortunately that doesn't work: `TypeError: save_mfdataset only supports writing Dataset objects, received type <class 'dask.delayed.Delayed'>`	{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	Preprocess function for save_mfdataset 712189206
701577652	https://github.com/pydata/xarray/issues/4475#issuecomment-701577652	https://api.github.com/repos/pydata/xarray/issues/4475	MDEyOklzc3VlQ29tbWVudDcwMTU3NzY1Mg==	dcherian 2448579	2020-09-30T18:51:25Z	2020-09-30T18:51:25Z	MEMBER	you could use `dask.delayed` here `new_datasets = [dask.delayed(your_function)(dset) for dset in datasets] xr.save_mfdataset(new_datasets, paths)` I think this will work, but I've never used `save_mfdataset`. This is how `preprocess` is implemented with `open_mfdataset` btw.	{ "total_count": 1, "+1": 1, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	Preprocess function for save_mfdataset 712189206

Advanced export

JSON shape: default, array, newline-delimited, object

CREATE TABLE [issue_comments] (
   [html_url] TEXT,
   [issue_url] TEXT,
   [id] INTEGER PRIMARY KEY,
   [node_id] TEXT,
   [user] INTEGER REFERENCES [users]([id]),
   [created_at] TEXT,
   [updated_at] TEXT,
   [author_association] TEXT,
   [body] TEXT,
   [reactions] TEXT,
   [performed_via_github_app] TEXT,
   [issue] INTEGER REFERENCES [issues]([id])
);
CREATE INDEX [idx_issue_comments_issue]
    ON [issue_comments] ([issue]);
CREATE INDEX [idx_issue_comments_user]
    ON [issue_comments] ([user]);