html_url,issue_url,id,node_id,user,created_at,updated_at,author_association,body,reactions,performed_via_github_app,issue
https://github.com/pydata/xarray/issues/1288#issuecomment-359543819,https://api.github.com/repos/pydata/xarray/issues/1288,359543819,MDEyOklzc3VlQ29tbWVudDM1OTU0MzgxOQ==,1217238,2018-01-22T19:50:25Z,2018-01-22T19:50:25Z,MEMBER,I opened https://github.com/pydata/xarray/issues/1850 to discuss xarray-contrib.,"{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,210704949
https://github.com/pydata/xarray/issues/1288#issuecomment-293982053,https://api.github.com/repos/pydata/xarray/issues/1288,293982053,MDEyOklzc3VlQ29tbWVudDI5Mzk4MjA1Mw==,1217238,2017-04-13T18:24:07Z,2017-04-13T18:24:07Z,MEMBER,"> Perhaps a new package would be in order?

I would also be very happy to include many of these in a submodule inside xarray, e.g., `xarray.scipy` for wrappers of the scipy API. This would make it easier to use internal methods like `apply_ufunc` (though hopefully that will be public API soon).","{""total_count"": 1, ""+1"": 1, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,210704949
https://github.com/pydata/xarray/issues/1288#issuecomment-287840030,https://api.github.com/repos/pydata/xarray/issues/1288,287840030,MDEyOklzc3VlQ29tbWVudDI4Nzg0MDAzMA==,1217238,2017-03-20T17:43:12Z,2017-03-20T17:43:12Z,MEMBER,"> By the way, the cumtrapz implementation I pasted above matches the scipy
version when initial=0, which I also think would be a more sane default for
integration.

Yes, I agree with both of you that we should fix `initial=0`. (I don't know if I would even bother with adding the option.)

> As far as implementation is concerned. Is there any performance downside to
using xarrays shift operators versus delving deeper into dask with
map_blocks, etc? I looked into using dasks cumreduction function, but am
not sure it is possible to implement the trapezoid method in that way
without changing dask.

From a performance perspective, it would be totally fine to implement this either in terms of high level xarray operations like `shift`/`sum`/`cumsum` (manipulating full xarray objects) or in terms of high level dask.array operations like `dask.array.cumsum` (manipulating dask arrays). I would whatever is easiest. I'm pretty sure there is no reason why you need to get into dask's low-level API like map_blocks and cumreduction.","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,210704949
https://github.com/pydata/xarray/issues/1288#issuecomment-287680627,https://api.github.com/repos/pydata/xarray/issues/1288,287680627,MDEyOklzc3VlQ29tbWVudDI4NzY4MDYyNw==,1217238,2017-03-20T05:22:10Z,2017-03-20T05:22:10Z,MEMBER,"Sorry for letting this lapse.

Yes, we absolutely want this functionality in some form.

> My concern is that this doesn't feel like functionality that inherently belongs as a method on a DataArray--if doesn't need to be a method, it shouldn't be. In numpy and scipy, these are separate functions and I think they work fine that way.

This is a fair point, and I agree with you from a purist OO-programming/software-engineering perspective (TensorFlow, for example, takes this approach). But with xarray, we have been taking a different path, putting methods on objects for the convenience of method chaining (like pandas). So from a consistency perspective, I think it's fine to keep these as methods. This is somewhat similar even to NumPy, where a number of the most commonly used functions are also methods.

> Perhaps allow generic extension of da.integrate by letting the method keyword of da.integrate accept a function as an argument that performs the actual integration?

I don't see a big advantage to adding such an extension point. Almost assuredly it's less text and more clear to simply write `ds.pipe(my_integrate, 'x')` or `my_integrate(ds, 'x')` rather than `ds.integrate('x', my_integrate)`.

> Maybe this could be implemented by adding an optional cumulative flag.

I normally don't like adding flags for switching functionality entirely but maybe that would make sense here if there's enough shared code (e.g., simply substituting `cumsum` for `sum`). The alternative is something like `cum_integrate` which sounds kind of awkward and is one more additional method.

One thing that can be useful to do before writing code is to write out a docstring with all the bells and whistles we might eventually add. So let's give that a shot here and see if `integrate` still makes sense:
```
integrate(dim, method='trapz', cumulative=False)

Arguments
---------
dim : str or DataArray
    DataArray or reference to an existing coordinate, labeling
    what to integrate over.
cumulative : bool, optional
    Whether to do a non-cumulative (default) or cumulative integral.
method : 'trapz' or 'simps', optional
    Whether to use the trapezoidal rule or Simpson's rule.
```

I could also imagine possibly adding a `bounds` or `limits` argument that specifies multiple limits for controlling multiple integrals at once (e.g., `dim='x'` and `bounds=[0, 10, 20, 30, 40, 50]` would result in an `x` dimension of length 5). This would certainly be useful for some of my current work. But maybe we should save this sort of add for later...","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,210704949
https://github.com/pydata/xarray/issues/1288#issuecomment-283109247,https://api.github.com/repos/pydata/xarray/issues/1288,283109247,MDEyOklzc3VlQ29tbWVudDI4MzEwOTI0Nw==,1217238,2017-02-28T17:34:05Z,2017-02-28T19:00:00Z,MEMBER,"As usual @rabernat raises some excellent points!

>  I weakly prefer not to use the name integrate and instead keep the standard scipy names because they make clear the numerical algorithm that is being applied. 

Yes, this is a totally valid concern, if a user might expect `integrate` to be calculating something different.

One point in favor of calling this `integrate` is that the name is highly searchable, which provides an excellent place to include documentation about how to integrate in general (including links to other packages, like pangeo's vector calculus package). But we know that nobody reads documentation ;).

> But where does it end? Why not implement the rest of the scipy.ode module?

Looking at the rest of [`scipy.integrate`](https://docs.scipy.org/doc/scipy-0.18.1/reference/integrate.html), in some ways the functionality of `trapz`/`cumtrapz`/`simps` is uniquely well suited for xarray: they are focused on data (""given fixed samples"") rather than solving a system of equations (""given a function"").

[`numpy.gradient`](https://docs.scipy.org/doc/numpy/reference/generated/numpy.gradient.html) feels very complementary as well, so I could see that as also in scope, but there are similar concerns for the name. There might be some value in complementary names for integrals/gradients.

> As a community we need to develop a roadmap that clearly defines the scope of xarray.

I doubt we'll be able to come up with hard and fast rules, but maybe we can enumerate some principles, e.g.,

- Features should be useful to users in multiple fields.
- Features should be primarily about working with labeled data.
- We are aiming for the 20% of functionality that covers 80% of use cases, not the long tail.
- We don't want implementations of any complex numerical methods in xarray (like NumPy rather than SciPy).
- Sometimes it's OK to include a feature in xarray because it makes logical sense with the rest of the package even if it's slightly domain specific (e.g., CF-conventions for netCDF files).","{""total_count"": 2, ""+1"": 2, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,210704949
https://github.com/pydata/xarray/issues/1288#issuecomment-282968309,https://api.github.com/repos/pydata/xarray/issues/1288,282968309,MDEyOklzc3VlQ29tbWVudDI4Mjk2ODMwOQ==,1217238,2017-02-28T07:55:14Z,2017-02-28T08:09:23Z,MEMBER,"I agree that the API should mostly copy the `mean`/`sum` reduce methods (and in fact the implementation could probably share much of the logic). But there's still a question of whether the API should expose multiple methods like `DataArray.trapz`/`DataArray.simps` or a single method like `DataArray.integrate` (with `method='simps'`/`method='trapz'`).

As long as there isn't something else we'd want to reserve the name for, I like the sound of `integrate` a little better, because it's more self-descriptive. `trapz` is only obvious if you know the name of the NumPy method. In contrast, `integrate` is the obvious way to approximate an integral. I would only hold off on using `integrate` if there is different functionality that comes to mind with the same.

It looks like SciPy implements Simpson's rule with the same API (see [scipy.integrate.simps](https://docs.scipy.org/doc/scipy/reference/generated/scipy.integrate.simps.html)), so that would be easy to support, too. Given how prevalent SciPy is these days, I would have no compunctions about making scipy required for this method and defaulting to `method='simps'` for `DataArray.integrate`.

It would be useful to have dask.array versions of these functions, too, but that's not essential for a first pass. The implementation of `trapz` is [very simple](https://github.com/numpy/numpy/blob/v1.12.0/numpy/lib/function_base.py#L4270-L4357), so this would be quite easy to add to dask.

CC @spencerahill @rabernat @lesommer in case any of you have opinions about this","{""total_count"": 1, ""+1"": 1, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,210704949