html_url,issue_url,id,node_id,user,created_at,updated_at,author_association,body,reactions,performed_via_github_app,issue
https://github.com/pydata/xarray/issues/2912#issuecomment-773820054,https://api.github.com/repos/pydata/xarray/issues/2912,773820054,MDEyOklzc3VlQ29tbWVudDc3MzgyMDA1NA==,60338532,2021-02-05T06:20:40Z,2021-02-05T06:56:05Z,NONE,"I am trying to perform a fairly simplistic operation on a dataset involving editing of variable and global attributes on individual netcdf files of 3.5GB each. The files load instantly using `xr.open_dataset` but `dataset.to_netcdf()` is too slow to export after the modifications.
I have tried :
1. Without rechunking and dask invocations.
2. Varying chunk sizes followed by :
3. Using` load() `before `to_netcdf `
4. Using `persist()` or `compute ()` before `to_netcdf `
I am working on a HPC with 10 distributed workers . In all cases, the time taken is more than 15 minutes per file. Is it expected? What else can I try to speed up this process apart from further parallelizing the single file operations using dask delayed?","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,435535284