home / github

Menu
  • GraphQL API
  • Search all tables

issue_comments

Table actions
  • GraphQL API for issue_comments

3 rows where issue = 335523891 sorted by updated_at descending

✎ View and edit SQL

This data as json, CSV (advanced)

Suggested facets: created_at (date), updated_at (date)

user 3

  • shoyer 1
  • alexsalr 1
  • stale[bot] 1

author_association 2

  • NONE 2
  • MEMBER 1

issue 1

  • stacked_xarray.groupby('lat','lon').apply(func) over 3D array takes too long · 3 ✖
id html_url issue_url node_id user created_at updated_at ▲ author_association body reactions performed_via_github_app issue
635735823 https://github.com/pydata/xarray/issues/2249#issuecomment-635735823 https://api.github.com/repos/pydata/xarray/issues/2249 MDEyOklzc3VlQ29tbWVudDYzNTczNTgyMw== stale[bot] 26384082 2020-05-29T03:26:08Z 2020-05-29T03:26:08Z NONE

In order to maintain a list of currently relevant issues, we mark issues as stale after a period of inactivity

If this issue remains relevant, please comment here or remove the stale label; otherwise it will be marked as closed automatically

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  stacked_xarray.groupby('lat','lon').apply(func) over 3D array takes too long 335523891
400800345 https://github.com/pydata/xarray/issues/2249#issuecomment-400800345 https://api.github.com/repos/pydata/xarray/issues/2249 MDEyOklzc3VlQ29tbWVudDQwMDgwMDM0NQ== alexsalr 7217358 2018-06-27T19:25:10Z 2018-06-27T19:38:46Z NONE

I was trying to apply the same groupby('lat','long').apply() strategy for interpolating time series of optical remote sensing images. With @shoyer 's suggestions I managed to apply and paralellize a ufunc, which was significantly faster than operating by pixels. However I am still looking for a way to optimize the spline fitting and evaluation (maybe numba, as suggested). Any other suggestions would be appreciated.

Im working with dask arrays, and my data looks like this:

<xarray.DataArray (time: 8, y: 1000, x: 1000) dask.array<shape=(8, 1000, 1000), dtype=float32, chunksize=(8, 100, 100)> <Coordinates: < * time (time) datetime64[ns] 2015-12-11 2015-12-21 2015-12-31 ... < * x (x) float64 4.989e+05 4.989e+05 4.989e+05 4.989e+05 4.989e+05 ... < * y (y) float64 4.385e+05 4.384e+05 4.384e+05 4.384e+05 4.384e+05 ... < mask (time, y, x) int8 dask.array<shape=(8, 1000, 1000), chunksize=(8, 100, 100)>

def interpolate_band(da, int_dates): # Apply ufunc-- inputs xr.DataArray and dates for interpolation # returns data array with interpolated values for int_dates result = xr.apply_ufunc(ufunc_cubic_spline, da, input_core_dims=[['time']], output_core_dims=[['ntime']], kwargs={'axis': -1, 'orig_times': da.time.values, 'new_times': int_dates}, dask='parallelized', output_dtypes=[np.float32], output_sizes={'ntime':int_dates.shape[0]}) result['ntime'] = ('ntime', int_time) return result

def ufunc_cubic_spline(a, axis, orig_times, new_times): # Reshape array to 2d (pixels, dates) data = a.reshape(axis, a.shape[axis]) # Fit cubic spline and interpolate dates results = np.apply_along_axis(_cubic_spline, 1, data, orig_times=orig_times, new_times=new_times) # Reshape to original pixels (y,x) and number of interpolated dates return results.reshape((a.shape[0],a.shape[1],new_times.shape[0])) for the interpolation I'm using numpy's CubicSpline

``` def _cubic_spline(y, orig_times, new_times): # Filter NaNs nans = np.isnan(y)#.values)[:,0] # Try to fit cubic spline with filtered y values try: spl = interpolate.CubicSpline(orig_times.astype('d')[~nans], y[~nans])

    interpolated = spl(new_times.astype('d'))

except ValueError:
    ## When spline cannot be fitted(not enought data), return NaN
    ## TODO raise warning
    interpolated = np.empty(new_times.shape[0])
    interpolated[:] = np.nan

return interpolated

```

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  stacked_xarray.groupby('lat','lon').apply(func) over 3D array takes too long 335523891
400104003 https://github.com/pydata/xarray/issues/2249#issuecomment-400104003 https://api.github.com/repos/pydata/xarray/issues/2249 MDEyOklzc3VlQ29tbWVudDQwMDEwNDAwMw== shoyer 1217238 2018-06-25T21:37:18Z 2018-06-25T21:37:18Z MEMBER

If you want to speed this up significantly, you'll need to make my_function something that you can apply over many time-series at once, e.g., over all columns of a 2D array rather than a single vector at a time. You might try writing a "generalized ufunc" for the logic with numba (guvectorize), and wrapping it with xarray's apply_ufunc.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  stacked_xarray.groupby('lat','lon').apply(func) over 3D array takes too long 335523891

Advanced export

JSON shape: default, array, newline-delimited, object

CSV options:

CREATE TABLE [issue_comments] (
   [html_url] TEXT,
   [issue_url] TEXT,
   [id] INTEGER PRIMARY KEY,
   [node_id] TEXT,
   [user] INTEGER REFERENCES [users]([id]),
   [created_at] TEXT,
   [updated_at] TEXT,
   [author_association] TEXT,
   [body] TEXT,
   [reactions] TEXT,
   [performed_via_github_app] TEXT,
   [issue] INTEGER REFERENCES [issues]([id])
);
CREATE INDEX [idx_issue_comments_issue]
    ON [issue_comments] ([issue]);
CREATE INDEX [idx_issue_comments_user]
    ON [issue_comments] ([user]);
Powered by Datasette · Queries took 39.214ms · About: xarray-datasette