issues: 210704949

This data as json

id	node_id	number	title	user	state	locked	assignee	milestone	comments	created_at	updated_at	closed_at	author_association	active_lock_reason	draft	pull_request	body	reactions	performed_via_github_app	state_reason	repo	type
210704949	MDU6SXNzdWUyMTA3MDQ5NDk=	1288	Add trapz to DataArray for mathematical integration	16630731	closed	0			26	2017-02-28T07:09:22Z	2019-01-31T17:30:31Z	2019-01-31T17:30:31Z	NONE				Since scientific data is often an approximation to a continuous function, when we write mean() or sum(), our underlying intention is often to approximate an integral. For example, if we have temperature of a rod T(t, x) as a function of time and space, the average value Tavg(x) is the integral of T(t,x) with respect to x, divided by the length. I would guess that in practice, many uses of `mean()` and `sum()` are intending to approximate integrals of continuous functions. That is typically my use, at least. But simply adding up all values is a Riemann sum approximation to an integral which is not very accurate. For approximating an integral, it seems to me that the trapezoidal rule (`trapz()` in numpy) should be preferred to `sum()` or `mean()` in essentially all cases, as the trapezoidal rule is more accurate while still being efficient. It would be very useful to have `trapz()` as a method of DataArrays, so one could write, e.g., for an average value, `Tavg = T.trapz(dim='time') / totalTime`. Currently, I would have to use numpy's method and then rebuild the reduced-dimensional array myself: `TavgVal= np.trapz(T, T['time'], axis=0) / totalTime Tavg= xr.DataArray(TavgVal, coords=T['space'], dims='space')` It could even be useful to have a function like `mean_trapz()` that calculates the mean value based on trapz. More generally, one could imagine having other integration methods too. E.g., `data.integrate(dim='x', method='simpson')`. But `trapz` is probably good enough for many cases and a big improvement over `mean`, and `trapz` is very simple even for unequally spaced data. And `trapz` shouldn't be much less efficient in principle, although in practice I find `np.trapz()` to be several times slower than `np.mean()`. Quick examples demonstrating `sum`/`mean` vs. `trapz` to convince you of the superiority of `trapz`: `x = np.linspace(0, 2, 200) y = 1/3 * x*3 dx = x[1] - x[0] integralRiemann = dx np.sum(y) # 1.3467673375251465 integralTrapz = np.trapz(y, x) # 1.3333670025167712 integralExact = 4/3 # 1.3333333333333333` This second example demonstrates the special advantages of trapz() for periodic functions because the trapezoidal rule happens to be extremely accurate for periodic functions integrated over their period. `x = np.linspace(0, 2np.pi, 200) y = cos(x)2 meanRiemann = np.mean(y) # 0.50249999999999995 meanTrapz = np.trapz(y, x) / (2np.pi) # 0.5 meanExact = 1/2 # 0.5`	{ "url": "https://api.github.com/repos/pydata/xarray/issues/1288/reactions", "total_count": 3, "+1": 3, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }		completed	13221727	issue

Links from other tables

2 rows from issues_id in issues_labels
26 rows from issue in issue_comments