home / github

Menu
  • Search all tables
  • GraphQL API

issue_comments

Table actions
  • GraphQL API for issue_comments

11 rows where author_association = "MEMBER" and issue = 210704949 sorted by updated_at descending

✎ View and edit SQL

This data as json, CSV (advanced)

Suggested facets: reactions, created_at (date), updated_at (date)

user 3

  • shoyer 6
  • fmaussion 3
  • rabernat 2

issue 1

  • Add trapz to DataArray for mathematical integration · 11 ✖

author_association 1

  • MEMBER · 11 ✖
id html_url issue_url node_id user created_at updated_at ▲ author_association body reactions performed_via_github_app issue
359543819 https://github.com/pydata/xarray/issues/1288#issuecomment-359543819 https://api.github.com/repos/pydata/xarray/issues/1288 MDEyOklzc3VlQ29tbWVudDM1OTU0MzgxOQ== shoyer 1217238 2018-01-22T19:50:25Z 2018-01-22T19:50:25Z MEMBER

I opened https://github.com/pydata/xarray/issues/1850 to discuss xarray-contrib.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Add trapz to DataArray for mathematical integration 210704949
293982053 https://github.com/pydata/xarray/issues/1288#issuecomment-293982053 https://api.github.com/repos/pydata/xarray/issues/1288 MDEyOklzc3VlQ29tbWVudDI5Mzk4MjA1Mw== shoyer 1217238 2017-04-13T18:24:07Z 2017-04-13T18:24:07Z MEMBER

Perhaps a new package would be in order?

I would also be very happy to include many of these in a submodule inside xarray, e.g., xarray.scipy for wrappers of the scipy API. This would make it easier to use internal methods like apply_ufunc (though hopefully that will be public API soon).

{
    "total_count": 1,
    "+1": 1,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Add trapz to DataArray for mathematical integration 210704949
287840030 https://github.com/pydata/xarray/issues/1288#issuecomment-287840030 https://api.github.com/repos/pydata/xarray/issues/1288 MDEyOklzc3VlQ29tbWVudDI4Nzg0MDAzMA== shoyer 1217238 2017-03-20T17:43:12Z 2017-03-20T17:43:12Z MEMBER

By the way, the cumtrapz implementation I pasted above matches the scipy version when initial=0, which I also think would be a more sane default for integration.

Yes, I agree with both of you that we should fix initial=0. (I don't know if I would even bother with adding the option.)

As far as implementation is concerned. Is there any performance downside to using xarrays shift operators versus delving deeper into dask with map_blocks, etc? I looked into using dasks cumreduction function, but am not sure it is possible to implement the trapezoid method in that way without changing dask.

From a performance perspective, it would be totally fine to implement this either in terms of high level xarray operations like shift/sum/cumsum (manipulating full xarray objects) or in terms of high level dask.array operations like dask.array.cumsum (manipulating dask arrays). I would whatever is easiest. I'm pretty sure there is no reason why you need to get into dask's low-level API like map_blocks and cumreduction.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Add trapz to DataArray for mathematical integration 210704949
287749414 https://github.com/pydata/xarray/issues/1288#issuecomment-287749414 https://api.github.com/repos/pydata/xarray/issues/1288 MDEyOklzc3VlQ29tbWVudDI4Nzc0OTQxNA== fmaussion 10050469 2017-03-20T12:48:23Z 2017-03-20T12:48:23Z MEMBER

An argument against a single function is that the shape of the returned array is different in each case. Also, cumtrapz has an inital keyword which changes the shape of the returned array. It is currently set to None per default, but should be set to 0 per default IMO.

I this is not a problem, I also like to have one single function for integration (simpler from a user perspective).

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Add trapz to DataArray for mathematical integration 210704949
287680627 https://github.com/pydata/xarray/issues/1288#issuecomment-287680627 https://api.github.com/repos/pydata/xarray/issues/1288 MDEyOklzc3VlQ29tbWVudDI4NzY4MDYyNw== shoyer 1217238 2017-03-20T05:22:10Z 2017-03-20T05:22:10Z MEMBER

Sorry for letting this lapse.

Yes, we absolutely want this functionality in some form.

My concern is that this doesn't feel like functionality that inherently belongs as a method on a DataArray--if doesn't need to be a method, it shouldn't be. In numpy and scipy, these are separate functions and I think they work fine that way.

This is a fair point, and I agree with you from a purist OO-programming/software-engineering perspective (TensorFlow, for example, takes this approach). But with xarray, we have been taking a different path, putting methods on objects for the convenience of method chaining (like pandas). So from a consistency perspective, I think it's fine to keep these as methods. This is somewhat similar even to NumPy, where a number of the most commonly used functions are also methods.

Perhaps allow generic extension of da.integrate by letting the method keyword of da.integrate accept a function as an argument that performs the actual integration?

I don't see a big advantage to adding such an extension point. Almost assuredly it's less text and more clear to simply write ds.pipe(my_integrate, 'x') or my_integrate(ds, 'x') rather than ds.integrate('x', my_integrate).

Maybe this could be implemented by adding an optional cumulative flag.

I normally don't like adding flags for switching functionality entirely but maybe that would make sense here if there's enough shared code (e.g., simply substituting cumsum for sum). The alternative is something like cum_integrate which sounds kind of awkward and is one more additional method.

One thing that can be useful to do before writing code is to write out a docstring with all the bells and whistles we might eventually add. So let's give that a shot here and see if integrate still makes sense: ``` integrate(dim, method='trapz', cumulative=False)

Arguments

dim : str or DataArray DataArray or reference to an existing coordinate, labeling what to integrate over. cumulative : bool, optional Whether to do a non-cumulative (default) or cumulative integral. method : 'trapz' or 'simps', optional Whether to use the trapezoidal rule or Simpson's rule. ```

I could also imagine possibly adding a bounds or limits argument that specifies multiple limits for controlling multiple integrals at once (e.g., dim='x' and bounds=[0, 10, 20, 30, 40, 50] would result in an x dimension of length 5). This would certainly be useful for some of my current work. But maybe we should save this sort of add for later...

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Add trapz to DataArray for mathematical integration 210704949
283109247 https://github.com/pydata/xarray/issues/1288#issuecomment-283109247 https://api.github.com/repos/pydata/xarray/issues/1288 MDEyOklzc3VlQ29tbWVudDI4MzEwOTI0Nw== shoyer 1217238 2017-02-28T17:34:05Z 2017-02-28T19:00:00Z MEMBER

As usual @rabernat raises some excellent points!

I weakly prefer not to use the name integrate and instead keep the standard scipy names because they make clear the numerical algorithm that is being applied.

Yes, this is a totally valid concern, if a user might expect integrate to be calculating something different.

One point in favor of calling this integrate is that the name is highly searchable, which provides an excellent place to include documentation about how to integrate in general (including links to other packages, like pangeo's vector calculus package). But we know that nobody reads documentation ;).

But where does it end? Why not implement the rest of the scipy.ode module?

Looking at the rest of scipy.integrate, in some ways the functionality of trapz/cumtrapz/simps is uniquely well suited for xarray: they are focused on data ("given fixed samples") rather than solving a system of equations ("given a function").

numpy.gradient feels very complementary as well, so I could see that as also in scope, but there are similar concerns for the name. There might be some value in complementary names for integrals/gradients.

As a community we need to develop a roadmap that clearly defines the scope of xarray.

I doubt we'll be able to come up with hard and fast rules, but maybe we can enumerate some principles, e.g.,

  • Features should be useful to users in multiple fields.
  • Features should be primarily about working with labeled data.
  • We are aiming for the 20% of functionality that covers 80% of use cases, not the long tail.
  • We don't want implementations of any complex numerical methods in xarray (like NumPy rather than SciPy).
  • Sometimes it's OK to include a feature in xarray because it makes logical sense with the rest of the package even if it's slightly domain specific (e.g., CF-conventions for netCDF files).
{
    "total_count": 2,
    "+1": 2,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Add trapz to DataArray for mathematical integration 210704949
283128841 https://github.com/pydata/xarray/issues/1288#issuecomment-283128841 https://api.github.com/repos/pydata/xarray/issues/1288 MDEyOklzc3VlQ29tbWVudDI4MzEyODg0MQ== fmaussion 10050469 2017-02-28T18:53:41Z 2017-02-28T18:53:41Z MEMBER

I weakly prefer not to use the name integrate and instead keep the standard scipy names because they make clear the numerical algorithm that is being applied.

Yes, this is a totally valid concern, if a user might expect integrate to be calculating something different. One point in favor of calling this integrate is that the name is highly searchable, which provides an excellent place to include documentation about how to integrate in general (including links to other packages, like pangeo's vector calculus package). But we know that nobody reads documentation ;).

integrate would allow to do things like:

da.integrate(how='rectangle')

da.integrate(how='trapezoidal')

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Add trapz to DataArray for mathematical integration 210704949
283127924 https://github.com/pydata/xarray/issues/1288#issuecomment-283127924 https://api.github.com/repos/pydata/xarray/issues/1288 MDEyOklzc3VlQ29tbWVudDI4MzEyNzkyNA== rabernat 1197350 2017-02-28T18:50:11Z 2017-02-28T18:50:11Z MEMBER

And I'm fine with integrate if that is the consensus here.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Add trapz to DataArray for mathematical integration 210704949
283062107 https://github.com/pydata/xarray/issues/1288#issuecomment-283062107 https://api.github.com/repos/pydata/xarray/issues/1288 MDEyOklzc3VlQ29tbWVudDI4MzA2MjEwNw== rabernat 1197350 2017-02-28T14:59:54Z 2017-02-28T14:59:54Z MEMBER

Having an xarray wrapper on trapz or cumtrapz would definitely be useful for many users. I weakly prefer not to use the name integrate and instead keep the standard scipy names because they make clear the numerical algorithm that is being applied. The issue is that certain types of gridded data (such as output from numerical models) should actually not be integrated with the trapezoidal rule but rather should use the native finite volume discretization for their computational grid. The goal of our hypothetical pangeo vector calculus package is to implement integrals and derivatives in such a context. A built-in xarray integration function would apply in cases where the data is assumed to be continuous, and where no auxiliary information about the grid (beyond the coordinates) is available.

I will also make the same comment I always make when such feature requests are raised: yes, it always seems desirable to add new features to xarray on a function-by-function basis. But where does it end? Why not implement the rest of the scipy.ode module? And why stop there? As a community we need to develop a roadmap that clearly defines the scope of xarray. Once apply is stable, it might not be that hard to wrap a large fraction of the scipy library. But maybe that should live in a separate package.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Add trapz to DataArray for mathematical integration 210704949
282968309 https://github.com/pydata/xarray/issues/1288#issuecomment-282968309 https://api.github.com/repos/pydata/xarray/issues/1288 MDEyOklzc3VlQ29tbWVudDI4Mjk2ODMwOQ== shoyer 1217238 2017-02-28T07:55:14Z 2017-02-28T08:09:23Z MEMBER

I agree that the API should mostly copy the mean/sum reduce methods (and in fact the implementation could probably share much of the logic). But there's still a question of whether the API should expose multiple methods like DataArray.trapz/DataArray.simps or a single method like DataArray.integrate (with method='simps'/method='trapz').

As long as there isn't something else we'd want to reserve the name for, I like the sound of integrate a little better, because it's more self-descriptive. trapz is only obvious if you know the name of the NumPy method. In contrast, integrate is the obvious way to approximate an integral. I would only hold off on using integrate if there is different functionality that comes to mind with the same.

It looks like SciPy implements Simpson's rule with the same API (see scipy.integrate.simps), so that would be easy to support, too. Given how prevalent SciPy is these days, I would have no compunctions about making scipy required for this method and defaulting to method='simps' for DataArray.integrate.

It would be useful to have dask.array versions of these functions, too, but that's not essential for a first pass. The implementation of trapz is very simple, so this would be quite easy to add to dask.

CC @spencerahill @rabernat @lesommer in case any of you have opinions about this

{
    "total_count": 1,
    "+1": 1,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Add trapz to DataArray for mathematical integration 210704949
282970580 https://github.com/pydata/xarray/issues/1288#issuecomment-282970580 https://api.github.com/repos/pydata/xarray/issues/1288 MDEyOklzc3VlQ29tbWVudDI4Mjk3MDU4MA== fmaussion 10050469 2017-02-28T08:07:20Z 2017-02-28T08:07:20Z MEMBER

+1 for integrate

The cumulative integral is of very frequent use in atmospheric sciences, too : https://docs.scipy.org/doc/scipy-0.14.0/reference/generated/scipy.integrate.cumtrapz.html

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Add trapz to DataArray for mathematical integration 210704949

Advanced export

JSON shape: default, array, newline-delimited, object

CSV options:

CREATE TABLE [issue_comments] (
   [html_url] TEXT,
   [issue_url] TEXT,
   [id] INTEGER PRIMARY KEY,
   [node_id] TEXT,
   [user] INTEGER REFERENCES [users]([id]),
   [created_at] TEXT,
   [updated_at] TEXT,
   [author_association] TEXT,
   [body] TEXT,
   [reactions] TEXT,
   [performed_via_github_app] TEXT,
   [issue] INTEGER REFERENCES [issues]([id])
);
CREATE INDEX [idx_issue_comments_issue]
    ON [issue_comments] ([issue]);
CREATE INDEX [idx_issue_comments_user]
    ON [issue_comments] ([user]);
Powered by Datasette · Queries took 1521.83ms · About: xarray-datasette