html_url,issue_url,id,node_id,user,created_at,updated_at,author_association,body,reactions,performed_via_github_app,issue https://github.com/pydata/xarray/issues/2912#issuecomment-832864415,https://api.github.com/repos/pydata/xarray/issues/2912,832864415,MDEyOklzc3VlQ29tbWVudDgzMjg2NDQxNQ==,34693887,2021-05-05T17:12:19Z,2021-05-05T17:12:19Z,NONE,"I had a similar issue. I am trying to save a big xarray (~2 GB) dataset using `to_netcdf()`. Dataset: ![image](https://user-images.githubusercontent.com/34693887/117181133-c3152600-ad89-11eb-81be-0d5c2e80a368.png) I tried the following three approaches: 1. Directly save using `dset.to_netcdf()` 2. Load before save using `dset.load().to_netcdf()` 3. Chunk data and save using `dset.chunk({'time': 19968}).to_netcdf()` All three approaches failed to write to file which cause the python kernel to hang indefinitely or die. Any suggestion?","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,435535284 https://github.com/pydata/xarray/issues/2912#issuecomment-773820054,https://api.github.com/repos/pydata/xarray/issues/2912,773820054,MDEyOklzc3VlQ29tbWVudDc3MzgyMDA1NA==,60338532,2021-02-05T06:20:40Z,2021-02-05T06:56:05Z,NONE,"I am trying to perform a fairly simplistic operation on a dataset involving editing of variable and global attributes on individual netcdf files of 3.5GB each. The files load instantly using `xr.open_dataset` but `dataset.to_netcdf()` is too slow to export after the modifications. I have tried : 1. Without rechunking and dask invocations. 2. Varying chunk sizes followed by : 3. Using` load() `before `to_netcdf ` 4. Using `persist()` or `compute ()` before `to_netcdf ` I am working on a HPC with 10 distributed workers . In all cases, the time taken is more than 15 minutes per file. Is it expected? What else can I try to speed up this process apart from further parallelizing the single file operations using dask delayed?","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,435535284 https://github.com/pydata/xarray/issues/2912#issuecomment-542369777,https://api.github.com/repos/pydata/xarray/issues/2912,542369777,MDEyOklzc3VlQ29tbWVudDU0MjM2OTc3Nw==,668201,2019-10-15T19:32:50Z,2019-10-15T19:32:50Z,NONE,"Thanks for the explanations @jhamman and @shoyer :) Actually it turns out that I was not using particularly small chunks, but the filesystem for /tmp was faulty... After trying on a reliable filesystem, the results are much more reasonable.","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,435535284 https://github.com/pydata/xarray/issues/2912#issuecomment-533801682,https://api.github.com/repos/pydata/xarray/issues/2912,533801682,MDEyOklzc3VlQ29tbWVudDUzMzgwMTY4Mg==,668201,2019-09-21T14:21:17Z,2019-09-21T14:21:17Z,NONE,"> There are ways to side step some of these challenges (`save_mfdataset` and the distributed dask scheduler) @jhamman Could you elaborate on these ways ? I am having severe slow-downs when writing Datasets by blocks (backed by dask). I have also noticed that the slowdowns do not occur when writing to ramdisk. Here are the timings of `to_netcdf`, which uses default engine and encoding (the nc file is 4.3 GB) : - When writing to ramdisk (`/dev/shm/`) : 2min 1s - When writing to `/tmp/` : 27min 28s - When writing to `/tmp/` after `.load()`, as suggested here : 34s (`.load` takes 1min 43s) The workaround suggested here works, but the datasets may not always fit in memory, and it fails the essential purpose of dask... Note: I am using dask 2.3.0 and xarray 0.12.3","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,435535284 https://github.com/pydata/xarray/issues/2912#issuecomment-485505651,https://api.github.com/repos/pydata/xarray/issues/2912,485505651,MDEyOklzc3VlQ29tbWVudDQ4NTUwNTY1MQ==,2014301,2019-04-22T18:32:30Z,2019-04-22T18:36:38Z,NONE,"## Diagnosis Thank you very much! I found this. For now, I will use the load() option. ### Loading netCDFs ``` In [8]: time ncdat=reformat_LIS_outputs(outlist) CPU times: user 7.78 s, sys: 220 ms, total: 8 s Wall time: 8.02 s ``` ### Slower export ``` In [6]: time ncdat.to_netcdf('test_slow') CPU times: user 12min, sys: 8.19 s, total: 12min 9s Wall time: 12min 14s ``` ### Faster export ``` In [9]: time ncdat.load().to_netcdf('test_faster.nc') CPU times: user 42.6 s, sys: 2.82 s, total: 45.4 s Wall time: 54.6 s ``` ","{""total_count"": 9, ""+1"": 5, ""-1"": 0, ""laugh"": 1, ""hooray"": 1, ""confused"": 0, ""heart"": 1, ""rocket"": 1, ""eyes"": 0}",,435535284