id,node_id,number,title,user,state,locked,assignee,milestone,comments,created_at,updated_at,closed_at,author_association,active_lock_reason,draft,pull_request,body,reactions,performed_via_github_app,state_reason,repo,type 210704949,MDU6SXNzdWUyMTA3MDQ5NDk=,1288,Add trapz to DataArray for mathematical integration,16630731,closed,0,,,26,2017-02-28T07:09:22Z,2019-01-31T17:30:31Z,2019-01-31T17:30:31Z,NONE,,,,"Since scientific data is often an approximation to a continuous function, when we write mean() or sum(), our underlying intention is often to approximate an integral. For example, if we have temperature of a rod T(t, x) as a function of time and space, the average value Tavg(x) is the integral of T(t,x) with respect to x, divided by the length. I would guess that in practice, many uses of `mean()` and `sum()` are intending to approximate integrals of continuous functions. That is typically my use, at least. But simply adding up all values is a Riemann sum approximation to an integral which is not very accurate. For approximating an integral, it seems to me that the trapezoidal rule (`trapz()` in numpy) should be preferred to `sum()` or `mean()` in essentially all cases, as the trapezoidal rule is more accurate while still being efficient. **It would be very useful to have `trapz()` as a method of DataArrays, so one could write, e.g., for an average value, `Tavg = T.trapz(dim='time') / totalTime`.** Currently, I would have to use numpy's method and then rebuild the reduced-dimensional array myself: ``` TavgVal= np.trapz(T, T['time'], axis=0) / totalTime Tavg= xr.DataArray(TavgVal, coords=T['space'], dims='space') ``` It could even be useful to have a function like `mean_trapz()` that calculates the mean value based on trapz. More generally, one could imagine having other integration methods too. E.g., `data.integrate(dim='x', method='simpson')`. But `trapz` is probably good enough for many cases and a big improvement over `mean`, and `trapz` is very simple even for unequally spaced data. And `trapz` shouldn't be much less efficient in principle, although in practice I find `np.trapz()` to be several times slower than `np.mean()`. Quick examples demonstrating `sum`/`mean` vs. `trapz` to convince you of the superiority of `trapz`: ``` x = np.linspace(0, 2, 200) y = 1/3 * x**3 dx = x[1] - x[0] integralRiemann = dx * np.sum(y) # 1.3467673375251465 integralTrapz = np.trapz(y, x) # 1.3333670025167712 integralExact = 4/3 # 1.3333333333333333 ``` This second example demonstrates the special advantages of trapz() for periodic functions because the trapezoidal rule happens to be extremely accurate for periodic functions integrated over their period. ``` x = np.linspace(0, 2*np.pi, 200) y = cos(x)**2 meanRiemann = np.mean(y) # 0.50249999999999995 meanTrapz = np.trapz(y, x) / (2*np.pi) # 0.5 meanExact = 1/2 # 0.5 ```","{""url"": ""https://api.github.com/repos/pydata/xarray/issues/1288/reactions"", ""total_count"": 3, ""+1"": 3, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,completed,13221727,issue