home / github

Menu
  • Search all tables
  • GraphQL API

issue_comments

Table actions
  • GraphQL API for issue_comments

8 rows where user = 7504461 sorted by updated_at descending

✎ View and edit SQL

This data as json, CSV (advanced)

Suggested facets: issue_url, created_at (date), updated_at (date)

issue 3

  • Speed up operations with xarray dataset 5
  • pd.Grouper support? 2
  • interpolate/sample array at point 1

user 1

  • saulomeirelles · 8 ✖

author_association 1

  • NONE 8
id html_url issue_url node_id user created_at updated_at ▲ author_association body reactions performed_via_github_app issue
234043292 https://github.com/pydata/xarray/issues/912#issuecomment-234043292 https://api.github.com/repos/pydata/xarray/issues/912 MDEyOklzc3VlQ29tbWVudDIzNDA0MzI5Mg== saulomeirelles 7504461 2016-07-20T18:44:53Z 2016-07-20T18:44:53Z NONE

No, not really. I got no error message whatsoever. Is there any test I can do to tackle this?

Sent from Smartphone. Please forgive typos.

On Jul 20, 2016 8:41 PM, "Stephan Hoyer" notifications@github.com wrote:

I decided to wait for .load() to do the job but the kernel dies after a while.

Are you running out of memory? Can you tell what's going on? This is a little surprising to me.

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/pydata/xarray/issues/912#issuecomment-234042142, or mute the thread https://github.com/notifications/unsubscribe-auth/AHKCTXaBpbA0ieSdI2I_hIUjVBxuKaNeks5qXmvPgaJpZM4JQ0_D .

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Speed up operations with xarray dataset 166593563
234035910 https://github.com/pydata/xarray/issues/912#issuecomment-234035910 https://api.github.com/repos/pydata/xarray/issues/912 MDEyOklzc3VlQ29tbWVudDIzNDAzNTkxMA== saulomeirelles 7504461 2016-07-20T18:20:24Z 2016-07-20T18:20:24Z NONE

True.

I decided to wait for .load() to do the job but the kernel dies after a while.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Speed up operations with xarray dataset 166593563
234022793 https://github.com/pydata/xarray/issues/912#issuecomment-234022793 https://api.github.com/repos/pydata/xarray/issues/912 MDEyOklzc3VlQ29tbWVudDIzNDAyMjc5Mw== saulomeirelles 7504461 2016-07-20T17:36:02Z 2016-07-20T17:36:17Z NONE

Thanks, @shoyer !

Setting smaller chunks helps, however my issue is the way back.

This is fine:

%time conc_avg = ds.conc_profs.chunk({'burst': 10}).mean(('z','duration'))

CPU times: user 24 ms, sys: 0 ns, total: 24 ms Wall time: 23.8 ms

But this:

%time result = conc_avg.load()

takes an insane amount of time which intrigues me because is just a vector with 2845 points.

Is there another way to tackle this without dask like using a for-loop?

If dask is the way to go, what would be the quickest way to convert to numpy array?

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Speed up operations with xarray dataset 166593563
233998071 https://github.com/pydata/xarray/issues/912#issuecomment-233998071 https://api.github.com/repos/pydata/xarray/issues/912 MDEyOklzc3VlQ29tbWVudDIzMzk5ODA3MQ== saulomeirelles 7504461 2016-07-20T16:08:57Z 2016-07-20T16:08:57Z NONE

I've tried to create individual nc-files and then read them all using open_mfdataset but I got an error for opening too many files which was reported here before.

The glob is just a (bad) habit because I normally read multiple files. O_0

Cheers,

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Speed up operations with xarray dataset 166593563
233995495 https://github.com/pydata/xarray/issues/912#issuecomment-233995495 https://api.github.com/repos/pydata/xarray/issues/912 MDEyOklzc3VlQ29tbWVudDIzMzk5NTQ5NQ== saulomeirelles 7504461 2016-07-20T16:00:02Z 2016-07-20T16:00:02Z NONE

The input files are 2485 nested mat-files that come out from a measurement device. I read them in Python ( loadmat(matfile) ) and turn them into numpy arrays like this:

``` matfiles = glob('*sed.mat')

matfiles = sorted(matfiles ,key=lambda x: extract_number(x) )


if matfiles:

    ts = 2400
    zs = 160

    Burst        = np.empty(len(matfiles))
    Time         = np.empty((ts,len(matfiles)), dtype='datetime64[s]')
    ConcProf     = np.empty((ts,zs,len(matfiles)), dtype='float64')
    GsizeProf    = np.empty((ts,zs,len(matfiles)), dtype='float64')

```

Afterwards, I populate the matrices in a loop:

``` def f(i):
Dist, Burst[i], Time[:,i], ConcProf[:,:,i], GsizeProf[:,:,i] = getABSpars(matfiles[i])

```

where

``` def getABSpars(matfile):

ndata = loadmat(matfile)

Dist  = ndata['r']

t_dic = ndata['BurstInfo']['StartTime']

try:
    t_dt  = dt.datetime.strptime(t_dic, '%d-%b-%Y %H:%M:%S')
except:
    t_dic = t_dic + ' 00:00:00'
    t_dt  = dt.datetime.strptime(t_dic, '%d-%b-%Y %H:%M:%S')

t_range   = date_range( t_dt,
            periods = ndata['MassProfiles'].shape[1],
            freq    = ndata['BurstInfo']['MassProfileInterval']+'L')

Burst         = int(ndata['BurstInfo']['BurstNumber'])
Time          = t_range
ConcProf      = np.asarray(ndata['MassProfiles'] ).T
GsizeProf     = np.asarray(ndata['SizeProfiles']*1e6).T

return Dist, Burst, Time, ConcProf, GsizeProf

```

Using the multiprocessing package:

pool = ThreadPool(4) pool.map(f, range(len(matfiles))) pool.close()

Finally I create the xarray dataset and then save into a nc-file:

``` ds = xray.Dataset( { 'conc_profs' : ( ['duration', 'z', 'burst'], ConcProf ), 'grainSize_profs' : ( ['duration', 'z', 'burst'], GsizeProf ), 'burst_duration' : ( ['duration'], np.linspace(0,299, Time.shape[0]) ), }, coords = {'time' : (['duration', 'burst'], Time) , 'zdist' : (['z'], Dist), 'burst_nr' : (['burst'], Burst) } )

ds.to_netcdf('ABS_conc_size_12m.nc' , mode='w')

```

It costs me around 1 h to generate the nc-file.

Could this be the reason of my headaches?

Thanks!

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Speed up operations with xarray dataset 166593563
231021167 https://github.com/pydata/xarray/issues/364#issuecomment-231021167 https://api.github.com/repos/pydata/xarray/issues/364 MDEyOklzc3VlQ29tbWVudDIzMTAyMTE2Nw== saulomeirelles 7504461 2016-07-07T08:54:46Z 2016-07-07T08:59:15Z NONE

Thanks, @shoyer !

Here is an example of how I circumvented the problem:

data = np.random.rand(24*5) times = pd.date_range('2000-01-01', periods=24*5, freq='H') foo = xray.DataArray(data, coords=[times], dims=['time']) foo = foo.to_dataset(dim=foo.dims,name='foo')

T = time.mktime( dt.datetime(1970,1,1,12+1,25,12).timetuple() ) # 12.42 hours Tint = [ int( time.mktime( t.timetuple() ) / T ) for t in foo.time.values.astype('datetime64[s]').tolist()] foo2 = xray.DataArray( Tint, coords=foo.time.coords, dims=foo.time.dims) foo.merge(foo2.to_dataset(name='Tint'), inplace=True)

foo_grp = foo.groupby('Tint')

foo_grp.group.plot.line()

In my case, the dataset is quite large then it costed a lot of computational time to merge the new variable Tint.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  pd.Grouper support? 60303760
228723336 https://github.com/pydata/xarray/issues/364#issuecomment-228723336 https://api.github.com/repos/pydata/xarray/issues/364 MDEyOklzc3VlQ29tbWVudDIyODcyMzMzNg== saulomeirelles 7504461 2016-06-27T11:45:09Z 2016-06-27T11:45:09Z NONE

This is a very useful functionality. I am wondering if I can specify the time window, for example, like ds.groupby(time=pd.TimeGrouper('12.42H')). Is there a way to do that in xarray?

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  pd.Grouper support? 60303760
150618114 https://github.com/pydata/xarray/issues/191#issuecomment-150618114 https://api.github.com/repos/pydata/xarray/issues/191 MDEyOklzc3VlQ29tbWVudDE1MDYxODExNA== saulomeirelles 7504461 2015-10-23T16:00:26Z 2015-10-23T16:00:59Z NONE

Hi All,

This is indeed an excellent project with great potential!

I am wondering if there is any progress on the interpolation issue. I am working with an irregular time series which I would pretty much like to upsample using xray.

Thanks for all the effort!

Saulo

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  interpolate/sample array at point 38849807

Advanced export

JSON shape: default, array, newline-delimited, object

CSV options:

CREATE TABLE [issue_comments] (
   [html_url] TEXT,
   [issue_url] TEXT,
   [id] INTEGER PRIMARY KEY,
   [node_id] TEXT,
   [user] INTEGER REFERENCES [users]([id]),
   [created_at] TEXT,
   [updated_at] TEXT,
   [author_association] TEXT,
   [body] TEXT,
   [reactions] TEXT,
   [performed_via_github_app] TEXT,
   [issue] INTEGER REFERENCES [issues]([id])
);
CREATE INDEX [idx_issue_comments_issue]
    ON [issue_comments] ([issue]);
CREATE INDEX [idx_issue_comments_user]
    ON [issue_comments] ([user]);
Powered by Datasette · Queries took 15.559ms · About: xarray-datasette