id,node_id,number,title,user,state,locked,assignee,milestone,comments,created_at,updated_at,closed_at,author_association,active_lock_reason,draft,pull_request,body,reactions,performed_via_github_app,state_reason,repo,type 166593563,MDU6SXNzdWUxNjY1OTM1NjM=,912,Speed up operations with xarray dataset,7504461,closed,0,,,12,2016-07-20T14:21:40Z,2016-12-29T01:07:52Z,2016-12-29T01:07:52Z,NONE,,,,"Hi all, I've been recently having hard times to manipulate a `xarray` `dataset`. Not sure if I am making some awkward mistake, but it is taking an unacceptable amount of time to perform simple operations. Here is a piece of my code: ``` ncfile = glob('*conc_size_12m.nc') ds = xray.open_dataset(ncfile[0]) ds ``` ``` Dimensions: (burst: 2485, duration: 2400, z: 160) Coordinates: zdist (z) float64 0.01014 0.02027 0.03041 0.04054 0.05068 ... burst_nr (burst) float64 1.0 2.0 3.0 4.0 5.0 6.0 7.0 8.0 9.0 ... time (duration, burst) datetime64[ns] 2014-09-16T07:00:00 ... * burst (burst) int64 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 ... * duration (duration) int64 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 ... * z (z) int64 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 ... Data variables: conc_profs (duration, z, burst) float32 3.99138e-05 4.23636e-05 ... burst_duration (duration) float64 0.0 0.1246 0.2493 0.3739 0.4985 ... grainSize_profs (duration, z, burst) float32 200.0 200.0 200.0 200.0 ... ``` `ds.nbytes * (2 ** -30)` `7.15415246784687` `%time conc_avg = ds.conc_profs.chunk(2400).mean(('z','duration'))` ``` CPU times: user 12 ms, sys: 0 ns, total: 12 ms Wall time: 9.84 ms ``` `%time conc_avg.load()` `%time conc_avg = ds.conc_profs.isel(burst=0).mean(('z','duration'))` ``` CPU times: user 708 ms, sys: 2.87 s, total: 3.58 s Wall time: 1min 56s ``` If I work with chunks, it is impossible to load back the array in a reasonable amount of time (I waited for more than 30 min). Looping over the dimension `burst`, it takes about 2 minutes per loop which is also quite unreasonable. I was wondering if the problem could stem from the creation of my `dataset` which I saved into this 7+GB netCDF file. Could that be the case? I am working in a Linux Inter Core i5 which is supposed to handle these manipulations with no hicups. I use the IOOS environment to run `xarray` (vr '0.7.1'). Can someone provide me some advice on how to optimize my script? I am happy to supply with more details if needed. Cheers, ","{""url"": ""https://api.github.com/repos/pydata/xarray/issues/912/reactions"", ""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,completed,13221727,issue