html_url,issue_url,id,node_id,user,created_at,updated_at,author_association,body,reactions,performed_via_github_app,issue https://github.com/pydata/xarray/issues/4475#issuecomment-702276824,https://api.github.com/repos/pydata/xarray/issues/4475,702276824,MDEyOklzc3VlQ29tbWVudDcwMjI3NjgyNA==,2448579,2020-10-01T17:13:16Z,2020-10-01T17:13:16Z,MEMBER,"> doesn't that then require me to load everything into memory before writing? I think so. I would try multiple processes and see if that is fast enough for what you want to do. Or else, write to zarr. This will be parallelized and is a lot easier than dealing with HDF5","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,712189206 https://github.com/pydata/xarray/issues/4475#issuecomment-702226256,https://api.github.com/repos/pydata/xarray/issues/4475,702226256,MDEyOklzc3VlQ29tbWVudDcwMjIyNjI1Ng==,2448579,2020-10-01T15:46:45Z,2020-10-01T15:46:45Z,MEMBER,Are you using multiple threads or multiple processes? IIUC you should be using multiple processes for max writing efficiency.,"{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,712189206 https://github.com/pydata/xarray/issues/4475#issuecomment-701694586,https://api.github.com/repos/pydata/xarray/issues/4475,701694586,MDEyOklzc3VlQ29tbWVudDcwMTY5NDU4Ng==,1217238,2020-09-30T23:13:33Z,2020-09-30T23:13:33Z,MEMBER,"I think we could support delayed objects in `save_mfdataset`, at least in principle. But if you're OK using delayed objects, you might as well write each netCDF file separately using `dask.delayed`, e.g., ``` def write_dataset(dataset, path): your_function(ds).to_netcdf(path) result = [dask.delayed(write_dataset)(ds, path) for ds, path in zip(datasets, path)] dask.compute(result) ```","{""total_count"": 3, ""+1"": 3, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,712189206 https://github.com/pydata/xarray/issues/4475#issuecomment-701688956,https://api.github.com/repos/pydata/xarray/issues/4475,701688956,MDEyOklzc3VlQ29tbWVudDcwMTY4ODk1Ng==,2448579,2020-09-30T22:55:28Z,2020-09-30T22:55:28Z,MEMBER,"You could write to netCDF in `your_function` and avoid `save_mfdataset` altogether... I guess this is a good argument for adding a `preprocess` kwarg.","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,712189206 https://github.com/pydata/xarray/issues/4475#issuecomment-701577652,https://api.github.com/repos/pydata/xarray/issues/4475,701577652,MDEyOklzc3VlQ29tbWVudDcwMTU3NzY1Mg==,2448579,2020-09-30T18:51:25Z,2020-09-30T18:51:25Z,MEMBER,"you could use `dask.delayed` here ``` new_datasets = [dask.delayed(your_function)(dset) for dset in datasets] xr.save_mfdataset(new_datasets, paths) ``` I *think* this will work, but I've never used `save_mfdataset`. This is how `preprocess` is implemented with `open_mfdataset` btw.","{""total_count"": 1, ""+1"": 1, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,712189206