id,node_id,number,title,user,state,locked,assignee,milestone,comments,created_at,updated_at,closed_at,author_association,active_lock_reason,draft,pull_request,body,reactions,performed_via_github_app,state_reason,repo,type
166593563,MDU6SXNzdWUxNjY1OTM1NjM=,912,Speed up operations with xarray dataset,7504461,closed,0,,,12,2016-07-20T14:21:40Z,2016-12-29T01:07:52Z,2016-12-29T01:07:52Z,NONE,,,,"Hi all,

I've been recently having hard times to manipulate a `xarray` `dataset`. Not sure if I am making some awkward mistake, but it is taking an unacceptable amount of time to perform simple operations. 

Here is a piece of my code:

```
ncfile   = glob('*conc_size_12m.nc')
ds       = xray.open_dataset(ncfile[0])
ds
```

```
<xarray.Dataset>
Dimensions:          (burst: 2485, duration: 2400, z: 160)
Coordinates:
    zdist            (z) float64 0.01014 0.02027 0.03041 0.04054 0.05068 ...
    burst_nr         (burst) float64 1.0 2.0 3.0 4.0 5.0 6.0 7.0 8.0 9.0 ...
    time             (duration, burst) datetime64[ns] 2014-09-16T07:00:00 ...
  * burst            (burst) int64 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 ...
  * duration         (duration) int64 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 ...
  * z                (z) int64 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 ...
Data variables:
    conc_profs       (duration, z, burst) float32 3.99138e-05 4.23636e-05 ...
    burst_duration   (duration) float64 0.0 0.1246 0.2493 0.3739 0.4985 ...
    grainSize_profs  (duration, z, burst) float32 200.0 200.0 200.0 200.0 ...
```

`ds.nbytes * (2 ** -30)`

`7.15415246784687`

`%time conc_avg = ds.conc_profs.chunk(2400).mean(('z','duration'))`

```
CPU times: user 12 ms, sys: 0 ns, total: 12 ms
Wall time: 9.84 ms
```

`%time conc_avg.load()` 

`%time conc_avg = ds.conc_profs.isel(burst=0).mean(('z','duration'))`

```
CPU times: user 708 ms, sys: 2.87 s, total: 3.58 s
Wall time: 1min 56s
```

If I work with chunks, it is impossible to load back the array in a reasonable amount of time (I waited for more than 30 min).

Looping over the dimension `burst`, it takes about 2 minutes per loop which is also quite unreasonable.

I was wondering if the problem could stem from the creation of my `dataset` which I saved into this 7+GB netCDF file. Could that be the case?

I am working in a Linux Inter Core i5 which is supposed to handle these manipulations with no hicups. I use the IOOS environment to run `xarray` (vr '0.7.1').

Can someone provide me some advice on how to optimize my script? 

I am happy to supply with more details if needed.

Cheers,
","{""url"": ""https://api.github.com/repos/pydata/xarray/issues/912/reactions"", ""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,completed,13221727,issue