home / github / issues

Menu
  • Search all tables
  • GraphQL API

issues: 210704949

This data as json

id node_id number title user state locked assignee milestone comments created_at updated_at closed_at author_association active_lock_reason draft pull_request body reactions performed_via_github_app state_reason repo type
210704949 MDU6SXNzdWUyMTA3MDQ5NDk= 1288 Add trapz to DataArray for mathematical integration 16630731 closed 0     26 2017-02-28T07:09:22Z 2019-01-31T17:30:31Z 2019-01-31T17:30:31Z NONE      

Since scientific data is often an approximation to a continuous function, when we write mean() or sum(), our underlying intention is often to approximate an integral. For example, if we have temperature of a rod T(t, x) as a function of time and space, the average value Tavg(x) is the integral of T(t,x) with respect to x, divided by the length.

I would guess that in practice, many uses of mean() and sum() are intending to approximate integrals of continuous functions. That is typically my use, at least. But simply adding up all values is a Riemann sum approximation to an integral which is not very accurate.

For approximating an integral, it seems to me that the trapezoidal rule (trapz() in numpy) should be preferred to sum() or mean() in essentially all cases, as the trapezoidal rule is more accurate while still being efficient.

It would be very useful to have trapz() as a method of DataArrays, so one could write, e.g., for an average value, Tavg = T.trapz(dim='time') / totalTime. Currently, I would have to use numpy's method and then rebuild the reduced-dimensional array myself:

TavgVal= np.trapz(T, T['time'], axis=0) / totalTime Tavg= xr.DataArray(TavgVal, coords=T['space'], dims='space')

It could even be useful to have a function like mean_trapz() that calculates the mean value based on trapz. More generally, one could imagine having other integration methods too. E.g., data.integrate(dim='x', method='simpson'). But trapz is probably good enough for many cases and a big improvement over mean, and trapz is very simple even for unequally spaced data. And trapz shouldn't be much less efficient in principle, although in practice I find np.trapz() to be several times slower than np.mean().

Quick examples demonstrating sum/mean vs. trapz to convince you of the superiority of trapz:

x = np.linspace(0, 2, 200) y = 1/3 * x**3 dx = x[1] - x[0] integralRiemann = dx * np.sum(y) # 1.3467673375251465 integralTrapz = np.trapz(y, x) # 1.3333670025167712 integralExact = 4/3 # 1.3333333333333333

This second example demonstrates the special advantages of trapz() for periodic functions because the trapezoidal rule happens to be extremely accurate for periodic functions integrated over their period.

x = np.linspace(0, 2*np.pi, 200) y = cos(x)**2 meanRiemann = np.mean(y) # 0.50249999999999995 meanTrapz = np.trapz(y, x) / (2*np.pi) # 0.5 meanExact = 1/2 # 0.5

{
    "url": "https://api.github.com/repos/pydata/xarray/issues/1288/reactions",
    "total_count": 3,
    "+1": 3,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  completed 13221727 issue

Links from other tables

  • 2 rows from issues_id in issues_labels
  • 26 rows from issue in issue_comments
Powered by Datasette · Queries took 79.946ms · About: xarray-datasette