home / github

Menu
  • GraphQL API
  • Search all tables

issue_comments

Table actions
  • GraphQL API for issue_comments

13 rows where author_association = "MEMBER", issue = 320275317 and user = 1217238 sorted by updated_at descending

✎ View and edit SQL

This data as json, CSV (advanced)

Suggested facets: reactions, created_at (date), updated_at (date)

user 1

  • shoyer · 13 ✖

issue 1

  • implement interp() · 13 ✖

author_association 1

  • MEMBER · 13 ✖
id html_url issue_url node_id user created_at updated_at ▲ author_association body reactions performed_via_github_app issue
395596744 https://github.com/pydata/xarray/pull/2104#issuecomment-395596744 https://api.github.com/repos/pydata/xarray/issues/2104 MDEyOklzc3VlQ29tbWVudDM5NTU5Njc0NA== shoyer 1217238 2018-06-07T23:24:58Z 2018-06-07T23:24:58Z MEMBER

OK, let's merge this when CI passes.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  implement interp() 320275317
395169022 https://github.com/pydata/xarray/pull/2104#issuecomment-395169022 https://api.github.com/repos/pydata/xarray/issues/2104 MDEyOklzc3VlQ29tbWVudDM5NTE2OTAyMg== shoyer 1217238 2018-06-06T18:30:42Z 2018-06-06T18:30:42Z MEMBER

Do you want to add interp_like here or should we leave it for a later PR?

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  implement interp() 320275317
394939096 https://github.com/pydata/xarray/pull/2104#issuecomment-394939096 https://api.github.com/repos/pydata/xarray/issues/2104 MDEyOklzc3VlQ29tbWVudDM5NDkzOTA5Ng== shoyer 1217238 2018-06-06T04:49:22Z 2018-06-06T04:49:22Z MEMBER

Hmm. I guess numeric_only may currently only be an argument for the reduce method. We probably should add it for methods like mean, too.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  implement interp() 320275317
394754176 https://github.com/pydata/xarray/pull/2104#issuecomment-394754176 https://api.github.com/repos/pydata/xarray/issues/2104 MDEyOklzc3VlQ29tbWVudDM5NDc1NDE3Ng== shoyer 1217238 2018-06-05T15:30:23Z 2018-06-05T15:30:23Z MEMBER

Do we have other methods that behave similarly?

We do the same thing for aggregations like mean(), behavior originally copied from pandas: http://pandas.pydata.org/pandas-docs/version/0.22/generated/pandas.DataFrame.mean.html

We could add a numeric_only=True argument (like we do for reduction methods) to make this behavior more obvious and allow it to be controlled.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  implement interp() 320275317
393770344 https://github.com/pydata/xarray/pull/2104#issuecomment-393770344 https://api.github.com/repos/pydata/xarray/issues/2104 MDEyOklzc3VlQ29tbWVudDM5Mzc3MDM0NA== shoyer 1217238 2018-06-01T06:05:07Z 2018-06-01T06:05:07Z MEMBER

I would wait before trying to do interpolation with datetime types. There might be use cases for this but I can't think of them offhand. resample() would seem to cover most of these.

On the other hand, it could indeed be interesting to extend resample() to handling numbers. I proposed this a while ago for pandas (https://github.com/pandas-dev/pandas/issues/12828).

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  implement interp() 320275317
393757211 https://github.com/pydata/xarray/pull/2104#issuecomment-393757211 https://api.github.com/repos/pydata/xarray/issues/2104 MDEyOklzc3VlQ29tbWVudDM5Mzc1NzIxMQ== shoyer 1217238 2018-06-01T04:51:39Z 2018-06-01T04:51:39Z MEMBER

My opinions: 1. I suppose it probably does make sense to interpolate numeric coordinates, too. So this probably doesn't need to change. 2. I do think we should drop non-numeric variables. There just isn't any way to meaningfully interpolate them (especially strings). This means we will probably need to add a check in DataArray.interp to ensure that the variable dtype is numeric and otherwise raise an error.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  implement interp() 320275317
393756834 https://github.com/pydata/xarray/pull/2104#issuecomment-393756834 https://api.github.com/repos/pydata/xarray/issues/2104 MDEyOklzc3VlQ29tbWVudDM5Mzc1NjgzNA== shoyer 1217238 2018-06-01T04:48:33Z 2018-06-01T04:48:33Z MEMBER

This is very elegant and I think very close to being ready to merge.

There are two design issues that I would like to discuss first: 1. Should we interpolate both data_vars and coords or only data_vars? 2. How should we handle non-numeric variables?

The current behavior of this PR is: 1. Interpolate all variables, both data_vars and coords. 2. Non-numeric variables raise an error, because they can't be converted into float.

It might make sense to copy the behavior from Dataset.reduce: 1. Only interpolate data_vars. 2. Drop non-numeric variables instead of trying to interpolate them.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  implement interp() 320275317
391402735 https://github.com/pydata/xarray/pull/2104#issuecomment-391402735 https://api.github.com/repos/pydata/xarray/issues/2104 MDEyOklzc3VlQ29tbWVudDM5MTQwMjczNQ== shoyer 1217238 2018-05-23T16:03:06Z 2018-05-23T16:03:06Z MEMBER

Sorry, I still need to find time to finish reviewing this — hopefully very soon! On Wed, May 23, 2018 at 8:57 AM Joe Hamman notifications@github.com wrote:

Is there more to do here or is this ready to go in? Seems to me its ready to go.

— You are receiving this because you were mentioned.

Reply to this email directly, view it on GitHub https://github.com/pydata/xarray/pull/2104#issuecomment-391400628, or mute the thread https://github.com/notifications/unsubscribe-auth/ABKS1kp1VJkez0bTtOqUssKWUMc2tUIMks5t1YbOgaJpZM4Typql .

{
    "total_count": 2,
    "+1": 2,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  implement interp() 320275317
389680743 https://github.com/pydata/xarray/pull/2104#issuecomment-389680743 https://api.github.com/repos/pydata/xarray/issues/2104 MDEyOklzc3VlQ29tbWVudDM4OTY4MDc0Mw== shoyer 1217238 2018-05-16T22:05:14Z 2018-05-16T22:05:14Z MEMBER

Very nice! I noticed that the interpolation is performed among dimensions rather than coordinates in this PR. However the limitation to that is interpolation to/from curvilinear grids is not supported, which I think is a common enough use case, and would be extremely nice to have. Pretty sure scipy's interpolation tools work out of the box with curvilinear grids. Is an updated interface which works on coordinate variables rather than dimensions planned?

Yes, our intent is to eventually extend this same interface to handle these cases, but not in this initial PR.

{
    "total_count": 1,
    "+1": 1,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  implement interp() 320275317
387580695 https://github.com/pydata/xarray/pull/2104#issuecomment-387580695 https://api.github.com/repos/pydata/xarray/issues/2104 MDEyOklzc3VlQ29tbWVudDM4NzU4MDY5NQ== shoyer 1217238 2018-05-09T00:12:14Z 2018-05-09T00:12:14Z MEMBER

should we support the interpolation along the non-dimensional coordinate?

As I think I've mentioned before, I think we should save interpolation of unstructured grids for later -- if only to keep the complexity of this PR to the minimum.

Interpolation on unstructured grids needs some of auxiliary data structure to be done efficiently, so it might make sense to wait until after we support flexible index types such as a KDTree index saved on xarray objects. That said, we could also add it now (immediately after this PR), and add support for caching KDTrees later.

I think the API for unstructured grid interpolation is pretty straightforward and could even reuse the same interp/interpolate_at method. Unstructured grid interpolation would be triggered whenever multiple existing coordinates that share some of the same dimensions are interpolated.

what should be the default value of 'bounds_error' and fill_value. scipy's interpolator assumes bounds_error=True by default, but I think bounds_error=False is more xarray-like, but it may be unintuitive for users very familiar with scipy's behavior.

I agree, by default we should switch to bounds_error=False, and replace extrapolated values with NaN, which is already the default fill value for interp1d.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  implement interp() 320275317
386938847 https://github.com/pydata/xarray/pull/2104#issuecomment-386938847 https://api.github.com/repos/pydata/xarray/issues/2104 MDEyOklzc3VlQ29tbWVudDM4NjkzODg0Nw== shoyer 1217238 2018-05-07T02:14:11Z 2018-05-07T02:14:11Z MEMBER

My inclination would be to limit this method to interpolation on regular grids, i.e., the methods supported by scipy.interpolate.interpn or interp1d.

I think a skipna=True option could be useful, but I don't know how we could do that for regular interpolation with multi-dimensional data. So I think it might make sense to save skipna=True for interpolation on unstructured grids.

{
    "total_count": 2,
    "+1": 2,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  implement interp() 320275317
386834171 https://github.com/pydata/xarray/pull/2104#issuecomment-386834171 https://api.github.com/repos/pydata/xarray/issues/2104 MDEyOklzc3VlQ29tbWVudDM4NjgzNDE3MQ== shoyer 1217238 2018-05-05T20:52:57Z 2018-05-05T20:52:57Z MEMBER

I still need to look at the code here, but I don't think we need separate code paths for NaN vs no-NaNs. For example, suppose we have a DataArray with a coordinate x=[0, 1, 2], then da.interp(x=0.5, method='linear') should be equal to 0.5 * (da.sel(x=0) + da.sel(x=1)), regardless of the values of the DataArray. If either da[0] or da[1] is NaN, then the result would be NaN, too.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  implement interp() 320275317
386658665 https://github.com/pydata/xarray/pull/2104#issuecomment-386658665 https://api.github.com/repos/pydata/xarray/issues/2104 MDEyOklzc3VlQ29tbWVudDM4NjY1ODY2NQ== shoyer 1217238 2018-05-04T16:40:03Z 2018-05-04T16:40:03Z MEMBER

First of all, this is awesome!

One question: since I think this new interpolation method will be used quite frequently (more often than interpolate_na and reindex), perhaps we should give it a shorter name. Maybe interp?

We should also consider adding interp_like or interpolate_like by analogy to reindex_like.

I would like to this method working similar to isel, which may support vectorized interpolation.

Yes, I agree. Interpolation should be vectorized similarly to isel/sel.

In particular, in the long term I think we should aim to make interpolate_at(..., method='nearest') and reindex(..., method='nearest') equivalent.

How many interpolate methods should we support?

I think this is fine to start (we can always add more later).

How do we handle nan?

Which NaN are you referring to?

  1. data on the arrays being indexed
  2. Coordinates on the arrays being indexed
  3. Points at which to interpolate

Case (1) should definitely be supported. Especially if data is stored in dask arrays, we cannot necessarily check if there are NaNs.

Cases (2) and (3) are not important, because there are relatively few use-cases for NaN in coordinate arrays.

Do we support interpolation along dimension without coordinate? In that case, do we attach new coordinate to the object?

Currently we don't support this for reindex(), but I suppose we could do so unambiguous.

How should we do if new coordinate has the dimensional coordinate for the dimension to be interpolated?

I think the new coordinates should take priority, and the dimension coordinates on the new coordinate should be dropped. This is similar to what we do for sel():

``` In [10]: da = xr.DataArray([0, 0.1, 0.2, 0.1], dims='x', coords={'x': [0, 1, 2, 3]}) ...:

In [11]: da.sel(x=xr.DataArray([1, 2], dims=['x'], coords={'x': [1, 3]})) ...: Out[11]: <xarray.DataArray (x: 2)> array([0.1, 0.2]) Coordinates: * x (x) int64 1 2 ```

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  implement interp() 320275317

Advanced export

JSON shape: default, array, newline-delimited, object

CSV options:

CREATE TABLE [issue_comments] (
   [html_url] TEXT,
   [issue_url] TEXT,
   [id] INTEGER PRIMARY KEY,
   [node_id] TEXT,
   [user] INTEGER REFERENCES [users]([id]),
   [created_at] TEXT,
   [updated_at] TEXT,
   [author_association] TEXT,
   [body] TEXT,
   [reactions] TEXT,
   [performed_via_github_app] TEXT,
   [issue] INTEGER REFERENCES [issues]([id])
);
CREATE INDEX [idx_issue_comments_issue]
    ON [issue_comments] ([issue]);
CREATE INDEX [idx_issue_comments_user]
    ON [issue_comments] ([user]);
Powered by Datasette · Queries took 236.717ms · About: xarray-datasette