html_url,issue_url,id,node_id,user,created_at,updated_at,author_association,body,reactions,performed_via_github_app,issue
https://github.com/pydata/xarray/issues/912#issuecomment-234043292,https://api.github.com/repos/pydata/xarray/issues/912,234043292,MDEyOklzc3VlQ29tbWVudDIzNDA0MzI5Mg==,7504461,2016-07-20T18:44:53Z,2016-07-20T18:44:53Z,NONE,"No, not really. I got no error message whatsoever. Is there any test I can
do to tackle this?
Sent from Smartphone. Please forgive typos.
On Jul 20, 2016 8:41 PM, ""Stephan Hoyer"" notifications@github.com wrote:
> I decided to wait for .load() to do the job but the kernel dies after a
> while.
>
> Are you running out of memory? Can you tell what's going on? This is a
> little surprising to me.
>
> —
> You are receiving this because you authored the thread.
> Reply to this email directly, view it on GitHub
> https://github.com/pydata/xarray/issues/912#issuecomment-234042142, or mute
> the thread
> https://github.com/notifications/unsubscribe-auth/AHKCTXaBpbA0ieSdI2I_hIUjVBxuKaNeks5qXmvPgaJpZM4JQ0_D
> .
","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,166593563
https://github.com/pydata/xarray/issues/912#issuecomment-234035910,https://api.github.com/repos/pydata/xarray/issues/912,234035910,MDEyOklzc3VlQ29tbWVudDIzNDAzNTkxMA==,7504461,2016-07-20T18:20:24Z,2016-07-20T18:20:24Z,NONE,"True.
I decided to wait for `.load()` to do the job but the kernel dies after a while.
","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,166593563
https://github.com/pydata/xarray/issues/912#issuecomment-234022793,https://api.github.com/repos/pydata/xarray/issues/912,234022793,MDEyOklzc3VlQ29tbWVudDIzNDAyMjc5Mw==,7504461,2016-07-20T17:36:02Z,2016-07-20T17:36:17Z,NONE,"Thanks, @shoyer !
Setting smaller chunks helps, however my issue is the way back.
This is fine:
`%time conc_avg = ds.conc_profs.chunk({'burst': 10}).mean(('z','duration'))`
```
CPU times: user 24 ms, sys: 0 ns, total: 24 ms
Wall time: 23.8 ms
```
But this:
`%time result = conc_avg.load()`
takes an insane amount of time which intrigues me because is just a vector with 2845 points.
Is there another way to tackle this without `dask` like using a for-loop?
If `dask` is the way to go, what would be the quickest way to convert to numpy array?
","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,166593563
https://github.com/pydata/xarray/issues/912#issuecomment-233998071,https://api.github.com/repos/pydata/xarray/issues/912,233998071,MDEyOklzc3VlQ29tbWVudDIzMzk5ODA3MQ==,7504461,2016-07-20T16:08:57Z,2016-07-20T16:08:57Z,NONE,"I've tried to create individual nc-files and then read them all using `open_mfdataset` but I got an error for opening too many files which was reported here before.
The `glob` is just a (bad) habit because I normally read multiple files. O_0
Cheers,
","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,166593563
https://github.com/pydata/xarray/issues/912#issuecomment-233995495,https://api.github.com/repos/pydata/xarray/issues/912,233995495,MDEyOklzc3VlQ29tbWVudDIzMzk5NTQ5NQ==,7504461,2016-07-20T16:00:02Z,2016-07-20T16:00:02Z,NONE,"The input files are 2485 nested mat-files that come out from a measurement device. I read them in Python ( `loadmat(matfile)` ) and turn them into numpy arrays like this:
```
matfiles = glob('*sed.mat')
matfiles = sorted(matfiles ,key=lambda x: extract_number(x) )
if matfiles:
ts = 2400
zs = 160
Burst = np.empty(len(matfiles))
Time = np.empty((ts,len(matfiles)), dtype='datetime64[s]')
ConcProf = np.empty((ts,zs,len(matfiles)), dtype='float64')
GsizeProf = np.empty((ts,zs,len(matfiles)), dtype='float64')
```
Afterwards, I populate the matrices in a loop:
```
def f(i):
Dist, Burst[i], Time[:,i], ConcProf[:,:,i], GsizeProf[:,:,i] = getABSpars(matfiles[i])
```
where
```
def getABSpars(matfile):
ndata = loadmat(matfile)
Dist = ndata['r']
t_dic = ndata['BurstInfo']['StartTime']
try:
t_dt = dt.datetime.strptime(t_dic, '%d-%b-%Y %H:%M:%S')
except:
t_dic = t_dic + ' 00:00:00'
t_dt = dt.datetime.strptime(t_dic, '%d-%b-%Y %H:%M:%S')
t_range = date_range( t_dt,
periods = ndata['MassProfiles'].shape[1],
freq = ndata['BurstInfo']['MassProfileInterval']+'L')
Burst = int(ndata['BurstInfo']['BurstNumber'])
Time = t_range
ConcProf = np.asarray(ndata['MassProfiles'] ).T
GsizeProf = np.asarray(ndata['SizeProfiles']*1e6).T
return Dist, Burst, Time, ConcProf, GsizeProf
```
Using the `multiprocessing` package:
```
pool = ThreadPool(4)
pool.map(f, range(len(matfiles)))
pool.close()
```
Finally I create the xarray dataset and then save into a nc-file:
```
ds = xray.Dataset( { 'conc_profs' : ( ['duration', 'z', 'burst'], ConcProf ),
'grainSize_profs' : ( ['duration', 'z', 'burst'], GsizeProf ),
'burst_duration' : ( ['duration'], np.linspace(0,299, Time.shape[0]) ), },
coords = {'time' : (['duration', 'burst'], Time) ,
'zdist' : (['z'], Dist),
'burst_nr' : (['burst'], Burst) } )
ds.to_netcdf('ABS_conc_size_12m.nc' , mode='w')
```
It costs me around 1 h to generate the nc-file.
Could this be the reason of my headaches?
Thanks!
","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,166593563
https://github.com/pydata/xarray/issues/364#issuecomment-231021167,https://api.github.com/repos/pydata/xarray/issues/364,231021167,MDEyOklzc3VlQ29tbWVudDIzMTAyMTE2Nw==,7504461,2016-07-07T08:54:46Z,2016-07-07T08:59:15Z,NONE,"Thanks, @shoyer !
Here is an example of how I circumvented the problem:
`data = np.random.rand(24*5)`
`times = pd.date_range('2000-01-01', periods=24*5, freq='H')`
`foo = xray.DataArray(data, coords=[times], dims=['time'])`
`foo = foo.to_dataset(dim=foo.dims,name='foo')`
`T = time.mktime( dt.datetime(1970,1,1,12+1,25,12).timetuple() ) # 12.42 hours`
`Tint = [ int( time.mktime( t.timetuple() ) / T ) for t in foo.time.values.astype('datetime64[s]').tolist()]`
`foo2 = xray.DataArray( Tint, coords=foo.time.coords, dims=foo.time.dims)`
`foo.merge(foo2.to_dataset(name='Tint'), inplace=True)`
`foo_grp = foo.groupby('Tint')`
`foo_grp.group.plot.line()`
In my case, the `dataset` is quite large then it costed a lot of computational time to merge the new variable `Tint`.
","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,60303760
https://github.com/pydata/xarray/issues/364#issuecomment-228723336,https://api.github.com/repos/pydata/xarray/issues/364,228723336,MDEyOklzc3VlQ29tbWVudDIyODcyMzMzNg==,7504461,2016-06-27T11:45:09Z,2016-06-27T11:45:09Z,NONE,"This is a very useful functionality. I am wondering if I can specify the time window, for example, like `ds.groupby(time=pd.TimeGrouper('12.42H'))`. Is there a way to do that in `xarray`?
","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,60303760
https://github.com/pydata/xarray/issues/191#issuecomment-150618114,https://api.github.com/repos/pydata/xarray/issues/191,150618114,MDEyOklzc3VlQ29tbWVudDE1MDYxODExNA==,7504461,2015-10-23T16:00:26Z,2015-10-23T16:00:59Z,NONE,"Hi All,
This is indeed an excellent project with great potential!
I am wondering if there is any progress on the interpolation issue. I am working with an irregular time series which I would pretty much like to upsample using xray.
Thanks for all the effort!
Saulo
","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,38849807