home / github / issue_comments

Menu
  • Search all tables
  • GraphQL API

issue_comments: 233995495

This data as json

html_url issue_url id node_id user created_at updated_at author_association body reactions performed_via_github_app issue
https://github.com/pydata/xarray/issues/912#issuecomment-233995495 https://api.github.com/repos/pydata/xarray/issues/912 233995495 MDEyOklzc3VlQ29tbWVudDIzMzk5NTQ5NQ== 7504461 2016-07-20T16:00:02Z 2016-07-20T16:00:02Z NONE

The input files are 2485 nested mat-files that come out from a measurement device. I read them in Python ( loadmat(matfile) ) and turn them into numpy arrays like this:

``` matfiles = glob('*sed.mat')

matfiles = sorted(matfiles ,key=lambda x: extract_number(x) )


if matfiles:

    ts = 2400
    zs = 160

    Burst        = np.empty(len(matfiles))
    Time         = np.empty((ts,len(matfiles)), dtype='datetime64[s]')
    ConcProf     = np.empty((ts,zs,len(matfiles)), dtype='float64')
    GsizeProf    = np.empty((ts,zs,len(matfiles)), dtype='float64')

```

Afterwards, I populate the matrices in a loop:

``` def f(i):
Dist, Burst[i], Time[:,i], ConcProf[:,:,i], GsizeProf[:,:,i] = getABSpars(matfiles[i])

```

where

``` def getABSpars(matfile):

ndata = loadmat(matfile)

Dist  = ndata['r']

t_dic = ndata['BurstInfo']['StartTime']

try:
    t_dt  = dt.datetime.strptime(t_dic, '%d-%b-%Y %H:%M:%S')
except:
    t_dic = t_dic + ' 00:00:00'
    t_dt  = dt.datetime.strptime(t_dic, '%d-%b-%Y %H:%M:%S')

t_range   = date_range( t_dt,
            periods = ndata['MassProfiles'].shape[1],
            freq    = ndata['BurstInfo']['MassProfileInterval']+'L')

Burst         = int(ndata['BurstInfo']['BurstNumber'])
Time          = t_range
ConcProf      = np.asarray(ndata['MassProfiles'] ).T
GsizeProf     = np.asarray(ndata['SizeProfiles']*1e6).T

return Dist, Burst, Time, ConcProf, GsizeProf

```

Using the multiprocessing package:

pool = ThreadPool(4) pool.map(f, range(len(matfiles))) pool.close()

Finally I create the xarray dataset and then save into a nc-file:

``` ds = xray.Dataset( { 'conc_profs' : ( ['duration', 'z', 'burst'], ConcProf ), 'grainSize_profs' : ( ['duration', 'z', 'burst'], GsizeProf ), 'burst_duration' : ( ['duration'], np.linspace(0,299, Time.shape[0]) ), }, coords = {'time' : (['duration', 'burst'], Time) , 'zdist' : (['z'], Dist), 'burst_nr' : (['burst'], Burst) } )

ds.to_netcdf('ABS_conc_size_12m.nc' , mode='w')

```

It costs me around 1 h to generate the nc-file.

Could this be the reason of my headaches?

Thanks!

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  166593563
Powered by Datasette · Queries took 0.592ms · About: xarray-datasette