issue_comments: 400800345

This data as json

html_url	issue_url	id	node_id	user	created_at	updated_at	author_association	body	reactions	performed_via_github_app	issue
https://github.com/pydata/xarray/issues/2249#issuecomment-400800345	https://api.github.com/repos/pydata/xarray/issues/2249	400800345	MDEyOklzc3VlQ29tbWVudDQwMDgwMDM0NQ==	7217358	2018-06-27T19:25:10Z	2018-06-27T19:38:46Z	NONE	I was trying to apply the same groupby('lat','long').apply() strategy for interpolating time series of optical remote sensing images. With @shoyer 's suggestions I managed to apply and paralellize a ufunc, which was significantly faster than operating by pixels. However I am still looking for a way to optimize the spline fitting and evaluation (maybe numba, as suggested). Any other suggestions would be appreciated. Im working with dask arrays, and my data looks like this: <xarray.DataArray (time: 8, y: 1000, x: 1000) dask.array<shape=(8, 1000, 1000), dtype=float32, chunksize=(8, 100, 100)> <Coordinates: < * time (time) datetime64[ns] 2015-12-11 2015-12-21 2015-12-31 ... < * x (x) float64 4.989e+05 4.989e+05 4.989e+05 4.989e+05 4.989e+05 ... < * y (y) float64 4.385e+05 4.384e+05 4.384e+05 4.384e+05 4.384e+05 ... < mask (time, y, x) int8 dask.array<shape=(8, 1000, 1000), chunksize=(8, 100, 100)> `def interpolate_band(da, int_dates): # Apply ufunc-- inputs xr.DataArray and dates for interpolation # returns data array with interpolated values for int_dates result = xr.apply_ufunc(ufunc_cubic_spline, da, input_core_dims=[['time']], output_core_dims=[['ntime']], kwargs={'axis': -1, 'orig_times': da.time.values, 'new_times': int_dates}, dask='parallelized', output_dtypes=[np.float32], output_sizes={'ntime':int_dates.shape[0]}) result['ntime'] = ('ntime', int_time) return result` `def ufunc_cubic_spline(a, axis, orig_times, new_times): # Reshape array to 2d (pixels, dates) data = a.reshape(axis, a.shape[axis]) # Fit cubic spline and interpolate dates results = np.apply_along_axis(_cubic_spline, 1, data, orig_times=orig_times, new_times=new_times) # Reshape to original pixels (y,x) and number of interpolated dates return results.reshape((a.shape[0],a.shape[1],new_times.shape[0]))` for the interpolation I'm using numpy's CubicSpline ``` def _cubic_spline(y, orig_times, new_times): # Filter NaNs nans = np.isnan(y)#.values)[:,0] # Try to fit cubic spline with filtered y values try: spl = interpolate.CubicSpline(orig_times.astype('d')[~nans], y[~nans]) `interpolated = spl(new_times.astype('d')) except ValueError: ## When spline cannot be fitted(not enought data), return NaN ## TODO raise warning interpolated = np.empty(new_times.shape[0]) interpolated[:] = np.nan return interpolated` ```	{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }		335523891