id,node_id,number,title,user,state,locked,assignee,milestone,comments,created_at,updated_at,closed_at,author_association,active_lock_reason,draft,pull_request,body,reactions,performed_via_github_app,state_reason,repo,type
210704949,MDU6SXNzdWUyMTA3MDQ5NDk=,1288,Add trapz to DataArray for mathematical integration,16630731,closed,0,,,26,2017-02-28T07:09:22Z,2019-01-31T17:30:31Z,2019-01-31T17:30:31Z,NONE,,,,"Since scientific data is often an approximation to a continuous function, when we write mean() or sum(), our underlying intention is often to approximate an integral.  For example, if we have temperature of a rod T(t, x) as a function of time and space, the average value Tavg(x) is the integral of T(t,x) with respect to x, divided by the length.  

I would guess that in practice, many uses of `mean()` and `sum()` are intending to approximate integrals of continuous functions.  That is typically my use, at least.  But simply adding up all values is a Riemann sum approximation to an integral which is not very accurate.  

For approximating an integral, it seems to me that the trapezoidal rule (`trapz()` in numpy) should be preferred to `sum()` or `mean()` in essentially all cases, as the trapezoidal rule is more accurate while still being efficient.

**It would be very useful to have `trapz()` as a method of DataArrays, so one could write, e.g., for an average value, `Tavg = T.trapz(dim='time') / totalTime`.**  Currently, I would have to use numpy's method and then rebuild the reduced-dimensional array myself:

```
TavgVal= np.trapz(T, T['time'], axis=0) / totalTime
Tavg= xr.DataArray(TavgVal, coords=T['space'], dims='space')
```

It could even be useful to have a function like `mean_trapz()` that calculates the mean value based on trapz.  More generally, one could imagine having other integration methods too.  E.g., `data.integrate(dim='x', method='simpson')`.  But `trapz` is probably good enough for many cases and a big improvement over `mean`, and `trapz` is very simple even for unequally spaced data.  And `trapz` shouldn't be much less efficient in principle, although in practice I find `np.trapz()` to be several times slower than `np.mean()`.

Quick examples demonstrating `sum`/`mean` vs. `trapz` to convince you of the superiority of `trapz`:

```
x = np.linspace(0, 2, 200)
y = 1/3 * x**3
dx = x[1] - x[0]
integralRiemann =  dx * np.sum(y)  # 1.3467673375251465
integralTrapz = np.trapz(y, x)  # 1.3333670025167712
integralExact = 4/3  # 1.3333333333333333
```

This second example demonstrates the special advantages of trapz() for periodic functions because the trapezoidal rule happens to be extremely accurate for periodic functions integrated over their period.

```
x = np.linspace(0, 2*np.pi, 200)
y = cos(x)**2
meanRiemann = np.mean(y)  #  0.50249999999999995
meanTrapz = np.trapz(y, x) / (2*np.pi)  # 0.5
meanExact = 1/2  # 0.5
```","{""url"": ""https://api.github.com/repos/pydata/xarray/issues/1288/reactions"", ""total_count"": 3, ""+1"": 3, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,completed,13221727,issue