github: issue_comments: 12 rows where issue = 166593563 sorted by updated

12 rows where issue = 166593563 sorted by updated_at descending

Search:

descending

id	html_url	issue_url	node_id	user	created_at	updated_at ▲	author_association	body	reactions	issue
269566487	https://github.com/pydata/xarray/issues/912#issuecomment-269566487	https://api.github.com/repos/pydata/xarray/issues/912	MDEyOklzc3VlQ29tbWVudDI2OTU2NjQ4Nw==	jhamman 2443309	2016-12-29T01:07:52Z	2016-12-29T01:07:52Z	MEMBER	@saulomeirelles - Hopefully, you were able to work through this issue. If not, feel free to reopen.	{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	Speed up operations with xarray dataset 166593563
234056046	https://github.com/pydata/xarray/issues/912#issuecomment-234056046	https://api.github.com/repos/pydata/xarray/issues/912	MDEyOklzc3VlQ29tbWVudDIzNDA1NjA0Ng==	shoyer 1217238	2016-07-20T19:29:55Z	2016-07-20T19:29:55Z	MEMBER	Just looking at a task manager while a task executes can give you a sense of what's going on. Dask also has some diagnostics that may be helpful: http://dask.pydata.org/en/latest/diagnostics.html On Wed, Jul 20, 2016 at 11:44 AM Saulo Meirelles notifications@github.com wrote: No, not really. I got no error message whatsoever. Is there any test I can do to tackle this? Sent from Smartphone. Please forgive typos. On Jul 20, 2016 8:41 PM, "Stephan Hoyer" notifications@github.com wrote: I decided to wait for .load() to do the job but the kernel dies after a while. Are you running out of memory? Can you tell what's going on? This is a little surprising to me. — You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/pydata/xarray/issues/912#issuecomment-234042142, or mute the thread < https://github.com/notifications/unsubscribe-auth/AHKCTXaBpbA0ieSdI2I_hIUjVBxuKaNeks5qXmvPgaJpZM4JQ0_D . — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/pydata/xarray/issues/912#issuecomment-234043292, or mute the thread https://github.com/notifications/unsubscribe-auth/ABKS1ujXItyYDLgA4ZtBkHEbTBTiTnrvks5qXmylgaJpZM4JQ0_D .	{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	Speed up operations with xarray dataset 166593563
234043292	https://github.com/pydata/xarray/issues/912#issuecomment-234043292	https://api.github.com/repos/pydata/xarray/issues/912	MDEyOklzc3VlQ29tbWVudDIzNDA0MzI5Mg==	saulomeirelles 7504461	2016-07-20T18:44:53Z	2016-07-20T18:44:53Z	NONE	No, not really. I got no error message whatsoever. Is there any test I can do to tackle this? Sent from Smartphone. Please forgive typos. On Jul 20, 2016 8:41 PM, "Stephan Hoyer" notifications@github.com wrote: I decided to wait for .load() to do the job but the kernel dies after a while. Are you running out of memory? Can you tell what's going on? This is a little surprising to me. — You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/pydata/xarray/issues/912#issuecomment-234042142, or mute the thread https://github.com/notifications/unsubscribe-auth/AHKCTXaBpbA0ieSdI2I_hIUjVBxuKaNeks5qXmvPgaJpZM4JQ0_D .	{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	Speed up operations with xarray dataset 166593563
234042142	https://github.com/pydata/xarray/issues/912#issuecomment-234042142	https://api.github.com/repos/pydata/xarray/issues/912	MDEyOklzc3VlQ29tbWVudDIzNDA0MjE0Mg==	shoyer 1217238	2016-07-20T18:41:17Z	2016-07-20T18:41:17Z	MEMBER	I decided to wait for .load() to do the job but the kernel dies after a while. Are you running out of memory? Can you tell what's going on? This is a little surprising to me.	{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	Speed up operations with xarray dataset 166593563
234035910	https://github.com/pydata/xarray/issues/912#issuecomment-234035910	https://api.github.com/repos/pydata/xarray/issues/912	MDEyOklzc3VlQ29tbWVudDIzNDAzNTkxMA==	saulomeirelles 7504461	2016-07-20T18:20:24Z	2016-07-20T18:20:24Z	NONE	True. I decided to wait for `.load()` to do the job but the kernel dies after a while.	{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	Speed up operations with xarray dataset 166593563
234026185	https://github.com/pydata/xarray/issues/912#issuecomment-234026185	https://api.github.com/repos/pydata/xarray/issues/912	MDEyOklzc3VlQ29tbWVudDIzNDAyNjE4NQ==	shoyer 1217238	2016-07-20T17:47:45Z	2016-07-20T17:47:45Z	MEMBER	It's worth noting that `conc_avg = ds.conc_profs.chunk({'burst': 10}).mean(('z','duration'))` doesn't actually do any computation -- that's why it's so fast. It just sets up the computation graph. No computation happens until you write `.load()`.	{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	Speed up operations with xarray dataset 166593563
234022793	https://github.com/pydata/xarray/issues/912#issuecomment-234022793	https://api.github.com/repos/pydata/xarray/issues/912	MDEyOklzc3VlQ29tbWVudDIzNDAyMjc5Mw==	saulomeirelles 7504461	2016-07-20T17:36:02Z	2016-07-20T17:36:17Z	NONE	Thanks, @shoyer ! Setting smaller chunks helps, however my issue is the way back. This is fine: `%time conc_avg = ds.conc_profs.chunk({'burst': 10}).mean(('z','duration'))` `CPU times: user 24 ms, sys: 0 ns, total: 24 ms Wall time: 23.8 ms` But this: `%time result = conc_avg.load()` takes an insane amount of time which intrigues me because is just a vector with 2845 points. Is there another way to tackle this without `dask` like using a for-loop? If `dask` is the way to go, what would be the quickest way to convert to numpy array?	{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	Speed up operations with xarray dataset 166593563
233998757	https://github.com/pydata/xarray/issues/912#issuecomment-233998757	https://api.github.com/repos/pydata/xarray/issues/912	MDEyOklzc3VlQ29tbWVudDIzMzk5ODc1Nw==	shoyer 1217238	2016-07-20T16:11:27Z	2016-07-20T16:11:27Z	MEMBER	When you write `ds.conc_profs.chunk(2400)`, it sets up the data to be loaded in a giant chunk, almost the entire file at once. Even if you use `.isel()` afterwards, dask does not always manage to subset the data from the initial chunk. (Sometimes it does succeed, which makes this a little confusing.) You will probably be more successful if you try something like `ds.conc_profs.chunk({'burst': 10})` instead, which keeps the intermediate chunks to a reasonable size.	{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	Speed up operations with xarray dataset 166593563
233998071	https://github.com/pydata/xarray/issues/912#issuecomment-233998071	https://api.github.com/repos/pydata/xarray/issues/912	MDEyOklzc3VlQ29tbWVudDIzMzk5ODA3MQ==	saulomeirelles 7504461	2016-07-20T16:08:57Z	2016-07-20T16:08:57Z	NONE	I've tried to create individual nc-files and then read them all using `open_mfdataset` but I got an error for opening too many files which was reported here before. The `glob` is just a (bad) habit because I normally read multiple files. O_0 Cheers,	{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	Speed up operations with xarray dataset 166593563
233996527	https://github.com/pydata/xarray/issues/912#issuecomment-233996527	https://api.github.com/repos/pydata/xarray/issues/912	MDEyOklzc3VlQ29tbWVudDIzMzk5NjUyNw==	shoyer 1217238	2016-07-20T16:03:30Z	2016-07-20T16:03:30Z	MEMBER	Thanks for describing that -- I misread your initial description and thought you were using `open_mfdataset` rather than `open_dataset` (the glob threw me off!). The source of these files shouldn't matter once you have it in a netCDF file.	{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	Speed up operations with xarray dataset 166593563
233995495	https://github.com/pydata/xarray/issues/912#issuecomment-233995495	https://api.github.com/repos/pydata/xarray/issues/912	MDEyOklzc3VlQ29tbWVudDIzMzk5NTQ5NQ==	saulomeirelles 7504461	2016-07-20T16:00:02Z	2016-07-20T16:00:02Z	NONE	The input files are 2485 nested mat-files that come out from a measurement device. I read them in Python ( `loadmat(matfile)` ) and turn them into numpy arrays like this: ``` matfiles = glob('sed.mat') `matfiles = sorted(matfiles ,key=lambda x: extract_number(x) ) if matfiles: ts = 2400 zs = 160 Burst = np.empty(len(matfiles)) Time = np.empty((ts,len(matfiles)), dtype='datetime64[s]') ConcProf = np.empty((ts,zs,len(matfiles)), dtype='float64') GsizeProf = np.empty((ts,zs,len(matfiles)), dtype='float64')` ``` Afterwards, I populate the matrices in a loop: ``` def f(i): Dist, Burst[i], Time[:,i], ConcProf[:,:,i], GsizeProf[:,:,i] = getABSpars(matfiles[i]) ``` where ``` def getABSpars(matfile): ndata = loadmat(matfile) Dist = ndata['r'] t_dic = ndata['BurstInfo']['StartTime'] try: t_dt = dt.datetime.strptime(t_dic, '%d-%b-%Y %H:%M:%S') except: t_dic = t_dic + ' 00:00:00' t_dt = dt.datetime.strptime(t_dic, '%d-%b-%Y %H:%M:%S') t_range = date_range( t_dt, periods = ndata['MassProfiles'].shape[1], freq = ndata['BurstInfo']['MassProfileInterval']+'L') Burst = int(ndata['BurstInfo']['BurstNumber']) Time = t_range ConcProf = np.asarray(ndata['MassProfiles'] ).T GsizeProf = np.asarray(ndata['SizeProfiles']1e6).T return Dist, Burst, Time, ConcProf, GsizeProf ``` Using the `multiprocessing` package: `pool = ThreadPool(4) pool.map(f, range(len(matfiles))) pool.close()` Finally I create the xarray dataset and then save into a nc-file: ``` ds = xray.Dataset( { 'conc_profs' : ( ['duration', 'z', 'burst'], ConcProf ), 'grainSize_profs' : ( ['duration', 'z', 'burst'], GsizeProf ), 'burst_duration' : ( ['duration'], np.linspace(0,299, Time.shape[0]) ), }, coords = {'time' : (['duration', 'burst'], Time) , 'zdist' : (['z'], Dist), 'burst_nr' : (['burst'], Burst) } ) ds.to_netcdf('ABS_conc_size_12m.nc' , mode='w') ``` It costs me around 1 h to generate the nc-file. Could this be the reason of my headaches? Thanks!	{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	Speed up operations with xarray dataset 166593563
233991357	https://github.com/pydata/xarray/issues/912#issuecomment-233991357	https://api.github.com/repos/pydata/xarray/issues/912	MDEyOklzc3VlQ29tbWVudDIzMzk5MTM1Nw==	shoyer 1217238	2016-07-20T15:46:50Z	2016-07-20T15:46:50Z	MEMBER	What do the original input files look like, before you join them together? This may be a case where the dask.array task scheduler does very poorly.	{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	Speed up operations with xarray dataset 166593563

Advanced export

JSON shape: default, array, newline-delimited, object

CREATE TABLE [issue_comments] (
   [html_url] TEXT,
   [issue_url] TEXT,
   [id] INTEGER PRIMARY KEY,
   [node_id] TEXT,
   [user] INTEGER REFERENCES [users]([id]),
   [created_at] TEXT,
   [updated_at] TEXT,
   [author_association] TEXT,
   [body] TEXT,
   [reactions] TEXT,
   [performed_via_github_app] TEXT,
   [issue] INTEGER REFERENCES [issues]([id])
);
CREATE INDEX [idx_issue_comments_issue]
    ON [issue_comments] ([issue]);
CREATE INDEX [idx_issue_comments_user]
    ON [issue_comments] ([user]);