issue_comments: 233995495

This data as json

html_url	issue_url	id	node_id	user	created_at	updated_at	author_association	body	reactions	performed_via_github_app	issue
https://github.com/pydata/xarray/issues/912#issuecomment-233995495	https://api.github.com/repos/pydata/xarray/issues/912	233995495	MDEyOklzc3VlQ29tbWVudDIzMzk5NTQ5NQ==	7504461	2016-07-20T16:00:02Z	2016-07-20T16:00:02Z	NONE	The input files are 2485 nested mat-files that come out from a measurement device. I read them in Python ( `loadmat(matfile)` ) and turn them into numpy arrays like this: ``` matfiles = glob('sed.mat') `matfiles = sorted(matfiles ,key=lambda x: extract_number(x) ) if matfiles: ts = 2400 zs = 160 Burst = np.empty(len(matfiles)) Time = np.empty((ts,len(matfiles)), dtype='datetime64[s]') ConcProf = np.empty((ts,zs,len(matfiles)), dtype='float64') GsizeProf = np.empty((ts,zs,len(matfiles)), dtype='float64')` ``` Afterwards, I populate the matrices in a loop: ``` def f(i): Dist, Burst[i], Time[:,i], ConcProf[:,:,i], GsizeProf[:,:,i] = getABSpars(matfiles[i]) ``` where ``` def getABSpars(matfile): ndata = loadmat(matfile) Dist = ndata['r'] t_dic = ndata['BurstInfo']['StartTime'] try: t_dt = dt.datetime.strptime(t_dic, '%d-%b-%Y %H:%M:%S') except: t_dic = t_dic + ' 00:00:00' t_dt = dt.datetime.strptime(t_dic, '%d-%b-%Y %H:%M:%S') t_range = date_range( t_dt, periods = ndata['MassProfiles'].shape[1], freq = ndata['BurstInfo']['MassProfileInterval']+'L') Burst = int(ndata['BurstInfo']['BurstNumber']) Time = t_range ConcProf = np.asarray(ndata['MassProfiles'] ).T GsizeProf = np.asarray(ndata['SizeProfiles']1e6).T return Dist, Burst, Time, ConcProf, GsizeProf ``` Using the `multiprocessing` package: `pool = ThreadPool(4) pool.map(f, range(len(matfiles))) pool.close()` Finally I create the xarray dataset and then save into a nc-file: ``` ds = xray.Dataset( { 'conc_profs' : ( ['duration', 'z', 'burst'], ConcProf ), 'grainSize_profs' : ( ['duration', 'z', 'burst'], GsizeProf ), 'burst_duration' : ( ['duration'], np.linspace(0,299, Time.shape[0]) ), }, coords = {'time' : (['duration', 'burst'], Time) , 'zdist' : (['z'], Dist), 'burst_nr' : (['burst'], Burst) } ) ds.to_netcdf('ABS_conc_size_12m.nc' , mode='w') ``` It costs me around 1 h to generate the nc-file. Could this be the reason of my headaches? Thanks!	{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }		166593563