html_url,issue_url,id,node_id,user,created_at,updated_at,author_association,body,reactions,performed_via_github_app,issue
https://github.com/pydata/xarray/issues/912#issuecomment-269566487,https://api.github.com/repos/pydata/xarray/issues/912,269566487,MDEyOklzc3VlQ29tbWVudDI2OTU2NjQ4Nw==,2443309,2016-12-29T01:07:52Z,2016-12-29T01:07:52Z,MEMBER,"@saulomeirelles - Hopefully, you were able to work through this issue. If not, feel free to reopen.","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,166593563
https://github.com/pydata/xarray/issues/912#issuecomment-234056046,https://api.github.com/repos/pydata/xarray/issues/912,234056046,MDEyOklzc3VlQ29tbWVudDIzNDA1NjA0Ng==,1217238,2016-07-20T19:29:55Z,2016-07-20T19:29:55Z,MEMBER,"Just looking at a task manager while a task executes can give you a sense
of what's going on. Dask also has some diagnostics that may be helpful:
http://dask.pydata.org/en/latest/diagnostics.html
On Wed, Jul 20, 2016 at 11:44 AM Saulo Meirelles notifications@github.com
wrote:

> No, not really. I got no error message whatsoever. Is there any test I can
> do to tackle this?
> 
> Sent from Smartphone. Please forgive typos.
> 
> On Jul 20, 2016 8:41 PM, ""Stephan Hoyer"" notifications@github.com wrote:
> 
> > I decided to wait for .load() to do the job but the kernel dies after a
> > while.
> > 
> > Are you running out of memory? Can you tell what's going on? This is a
> > little surprising to me.
> > 
> > —
> > You are receiving this because you authored the thread.
> > Reply to this email directly, view it on GitHub
> > https://github.com/pydata/xarray/issues/912#issuecomment-234042142,
> > or mute
> > the thread
> > <
> > https://github.com/notifications/unsubscribe-auth/AHKCTXaBpbA0ieSdI2I_hIUjVBxuKaNeks5qXmvPgaJpZM4JQ0_D
> > 
> > .
> 
> —
> You are receiving this because you were mentioned.
> Reply to this email directly, view it on GitHub
> https://github.com/pydata/xarray/issues/912#issuecomment-234043292, or mute
> the thread
> https://github.com/notifications/unsubscribe-auth/ABKS1ujXItyYDLgA4ZtBkHEbTBTiTnrvks5qXmylgaJpZM4JQ0_D
> .
","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,166593563
https://github.com/pydata/xarray/issues/912#issuecomment-234043292,https://api.github.com/repos/pydata/xarray/issues/912,234043292,MDEyOklzc3VlQ29tbWVudDIzNDA0MzI5Mg==,7504461,2016-07-20T18:44:53Z,2016-07-20T18:44:53Z,NONE,"No, not really. I got no error message whatsoever. Is there any test I can
do to tackle this?

Sent from Smartphone. Please forgive typos.

On Jul 20, 2016 8:41 PM, ""Stephan Hoyer"" notifications@github.com wrote:

> I decided to wait for .load() to do the job but the kernel dies after a
> while.
> 
> Are you running out of memory? Can you tell what's going on? This is a
> little surprising to me.
> 
> —
> You are receiving this because you authored the thread.
> Reply to this email directly, view it on GitHub
> https://github.com/pydata/xarray/issues/912#issuecomment-234042142, or mute
> the thread
> https://github.com/notifications/unsubscribe-auth/AHKCTXaBpbA0ieSdI2I_hIUjVBxuKaNeks5qXmvPgaJpZM4JQ0_D
> .
","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,166593563
https://github.com/pydata/xarray/issues/912#issuecomment-234042142,https://api.github.com/repos/pydata/xarray/issues/912,234042142,MDEyOklzc3VlQ29tbWVudDIzNDA0MjE0Mg==,1217238,2016-07-20T18:41:17Z,2016-07-20T18:41:17Z,MEMBER,"> I decided to wait for .load() to do the job but the kernel dies after a while.

Are you running out of memory? Can you tell what's going on? This is a little surprising to me.
","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,166593563
https://github.com/pydata/xarray/issues/912#issuecomment-234035910,https://api.github.com/repos/pydata/xarray/issues/912,234035910,MDEyOklzc3VlQ29tbWVudDIzNDAzNTkxMA==,7504461,2016-07-20T18:20:24Z,2016-07-20T18:20:24Z,NONE,"True.

I decided to wait for `.load()` to do the job but the kernel dies after a while. 
","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,166593563
https://github.com/pydata/xarray/issues/912#issuecomment-234026185,https://api.github.com/repos/pydata/xarray/issues/912,234026185,MDEyOklzc3VlQ29tbWVudDIzNDAyNjE4NQ==,1217238,2016-07-20T17:47:45Z,2016-07-20T17:47:45Z,MEMBER,"It's worth noting that `conc_avg = ds.conc_profs.chunk({'burst': 10}).mean(('z','duration'))` doesn't actually do any computation -- that's why it's so fast. It just sets up the computation graph. No computation happens until you write `.load()`.
","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,166593563
https://github.com/pydata/xarray/issues/912#issuecomment-234022793,https://api.github.com/repos/pydata/xarray/issues/912,234022793,MDEyOklzc3VlQ29tbWVudDIzNDAyMjc5Mw==,7504461,2016-07-20T17:36:02Z,2016-07-20T17:36:17Z,NONE,"Thanks, @shoyer !

Setting smaller chunks helps, however my issue is the way back.

This is fine:

`%time conc_avg = ds.conc_profs.chunk({'burst': 10}).mean(('z','duration'))`

```
CPU times: user 24 ms, sys: 0 ns, total: 24 ms
Wall time: 23.8 ms
```

But this:

`%time result = conc_avg.load()`

takes an insane amount of time which intrigues me because is just a vector with 2845 points.

Is there another way to tackle this without `dask` like using a for-loop? 

If `dask` is the way to go, what would be the quickest way to convert to numpy array?
","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,166593563
https://github.com/pydata/xarray/issues/912#issuecomment-233998757,https://api.github.com/repos/pydata/xarray/issues/912,233998757,MDEyOklzc3VlQ29tbWVudDIzMzk5ODc1Nw==,1217238,2016-07-20T16:11:27Z,2016-07-20T16:11:27Z,MEMBER,"When you write `ds.conc_profs.chunk(2400)`, it sets up the data to be loaded in a giant chunk, almost the entire file at once. Even if you use `.isel()` afterwards, dask does not always manage to subset the data from the initial chunk. (Sometimes it does succeed, which makes this a little confusing.)

You will probably be more successful if you try something like `ds.conc_profs.chunk({'burst': 10})` instead, which keeps the intermediate chunks to a reasonable size.
","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,166593563
https://github.com/pydata/xarray/issues/912#issuecomment-233998071,https://api.github.com/repos/pydata/xarray/issues/912,233998071,MDEyOklzc3VlQ29tbWVudDIzMzk5ODA3MQ==,7504461,2016-07-20T16:08:57Z,2016-07-20T16:08:57Z,NONE,"I've tried to create individual nc-files and then read them all using `open_mfdataset` but I got an error for opening too many files which was reported here before.

The `glob` is just a (bad) habit because I normally read multiple files. O_0

Cheers,
","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,166593563
https://github.com/pydata/xarray/issues/912#issuecomment-233996527,https://api.github.com/repos/pydata/xarray/issues/912,233996527,MDEyOklzc3VlQ29tbWVudDIzMzk5NjUyNw==,1217238,2016-07-20T16:03:30Z,2016-07-20T16:03:30Z,MEMBER,"Thanks for describing that -- I misread your initial description and thought you were using `open_mfdataset` rather than `open_dataset` (the glob threw me off!). The source of these files shouldn't matter once you have it in a netCDF file.
","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,166593563
https://github.com/pydata/xarray/issues/912#issuecomment-233995495,https://api.github.com/repos/pydata/xarray/issues/912,233995495,MDEyOklzc3VlQ29tbWVudDIzMzk5NTQ5NQ==,7504461,2016-07-20T16:00:02Z,2016-07-20T16:00:02Z,NONE,"The input files are 2485 nested mat-files that come out from a measurement device. I read them in Python ( `loadmat(matfile)` ) and turn them into numpy arrays like this:

```
    matfiles = glob('*sed.mat')


    matfiles = sorted(matfiles ,key=lambda x: extract_number(x) )


    if matfiles:

        ts = 2400
        zs = 160        

        Burst        = np.empty(len(matfiles))
        Time         = np.empty((ts,len(matfiles)), dtype='datetime64[s]')
        ConcProf     = np.empty((ts,zs,len(matfiles)), dtype='float64')
        GsizeProf    = np.empty((ts,zs,len(matfiles)), dtype='float64')
```

Afterwards, I populate the matrices in a loop:

```
def f(i):   
    Dist, Burst[i], Time[:,i], ConcProf[:,:,i], GsizeProf[:,:,i] = getABSpars(matfiles[i])

```

where

```
def getABSpars(matfile):    

    ndata = loadmat(matfile) 

    Dist  = ndata['r']

    t_dic = ndata['BurstInfo']['StartTime']

    try:
        t_dt  = dt.datetime.strptime(t_dic, '%d-%b-%Y %H:%M:%S')
    except:
        t_dic = t_dic + ' 00:00:00'
        t_dt  = dt.datetime.strptime(t_dic, '%d-%b-%Y %H:%M:%S')

    t_range   = date_range( t_dt,
                periods = ndata['MassProfiles'].shape[1],
                freq    = ndata['BurstInfo']['MassProfileInterval']+'L')     

    Burst         = int(ndata['BurstInfo']['BurstNumber'])
    Time          = t_range
    ConcProf      = np.asarray(ndata['MassProfiles'] ).T
    GsizeProf     = np.asarray(ndata['SizeProfiles']*1e6).T

    return Dist, Burst, Time, ConcProf, GsizeProf

```

Using the `multiprocessing` package:

```
pool = ThreadPool(4)
pool.map(f, range(len(matfiles)))
pool.close() 
```

Finally I create the xarray dataset and then save into a nc-file:

```
ds = xray.Dataset( { 'conc_profs'      : ( ['duration', 'z', 'burst'], ConcProf  ),
                     'grainSize_profs' : ( ['duration', 'z', 'burst'], GsizeProf ),
                     'burst_duration'  : ( ['duration'], np.linspace(0,299, Time.shape[0]) ), },
                  coords = {'time'      : (['duration', 'burst'], Time) ,
                            'zdist'     : (['z'], Dist),
                            'burst_nr'  : (['burst'], Burst) } )

ds.to_netcdf('ABS_conc_size_12m.nc' , mode='w')

```

It costs me around 1 h to generate the nc-file.

Could this be the reason of my headaches?

Thanks!
","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,166593563
https://github.com/pydata/xarray/issues/912#issuecomment-233991357,https://api.github.com/repos/pydata/xarray/issues/912,233991357,MDEyOklzc3VlQ29tbWVudDIzMzk5MTM1Nw==,1217238,2016-07-20T15:46:50Z,2016-07-20T15:46:50Z,MEMBER,"What do the original input files look like, before you join them together? This may be a case where the dask.array task scheduler does very poorly.
","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,166593563