home / github

Menu
  • GraphQL API
  • Search all tables

issue_comments

Table actions
  • GraphQL API for issue_comments

5 rows where issue = 265056503 sorted by updated_at descending

✎ View and edit SQL

This data as json, CSV (advanced)

Suggested facets: reactions, created_at (date), updated_at (date)

user 4

  • shoyer 2
  • jhamman 1
  • darothen 1
  • mmartini-usgs 1

author_association 2

  • MEMBER 3
  • NONE 2

issue 1

  • Resample / upsample behavior diverges from pandas · 5 ✖
id html_url issue_url node_id user created_at updated_at ▲ author_association body reactions performed_via_github_app issue
340535370 https://github.com/pydata/xarray/issues/1631#issuecomment-340535370 https://api.github.com/repos/pydata/xarray/issues/1631 MDEyOklzc3VlQ29tbWVudDM0MDUzNTM3MA== mmartini-usgs 23199378 2017-10-30T18:11:58Z 2017-10-30T18:11:58Z NONE

Thanks for posting this @jhamman. It's really helping me understand what is going on with my data when I use xarray. My understanding of Pandas is that it should not by default be interpolating - however I am downsampling and this is stated for upsampling (in Python for Data Analysis).

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Resample / upsample behavior diverges from pandas  265056503
336634555 https://github.com/pydata/xarray/issues/1631#issuecomment-336634555 https://api.github.com/repos/pydata/xarray/issues/1631 MDEyOklzc3VlQ29tbWVudDMzNjYzNDU1NQ== darothen 4992424 2017-10-14T13:19:58Z 2017-10-14T13:19:58Z NONE

Thanks for documenting this @jhamman. I think all the logic is in .resample(...).interpolate() to build out true interpolation or really imputation/infilling. I can jump in if there's any confusion in the code.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Resample / upsample behavior diverges from pandas  265056503
336592618 https://github.com/pydata/xarray/issues/1631#issuecomment-336592618 https://api.github.com/repos/pydata/xarray/issues/1631 MDEyOklzc3VlQ29tbWVudDMzNjU5MjYxOA== shoyer 1217238 2017-10-13T23:54:51Z 2017-10-13T23:54:51Z MEMBER

Let's see where the pandas discussion ends up. If xarray had a method for interpolating to fill missing values, achieving your desired result would be as a simple as chaining another interpolate call, e.g., .resample('1D').interpolate().interpolate_na() or .interpolate_na().resample('1D').interpolate().

{
    "total_count": 1,
    "+1": 1,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Resample / upsample behavior diverges from pandas  265056503
336592007 https://github.com/pydata/xarray/issues/1631#issuecomment-336592007 https://api.github.com/repos/pydata/xarray/issues/1631 MDEyOklzc3VlQ29tbWVudDMzNjU5MjAwNw== jhamman 2443309 2017-10-13T23:48:31Z 2017-10-13T23:48:31Z MEMBER

Thanks @shoyer. I always appreciated this feature in Pandas so I'm bummed to see it may not have been intentional. I need a xarray interpolate method that fills NaNs so I'll give that a go. I suspect it will be a widely used feature for dealing with missing data.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Resample / upsample behavior diverges from pandas  265056503
336578729 https://github.com/pydata/xarray/issues/1631#issuecomment-336578729 https://api.github.com/repos/pydata/xarray/issues/1631 MDEyOklzc3VlQ29tbWVudDMzNjU3ODcyOQ== shoyer 1217238 2017-10-13T22:05:41Z 2017-10-13T22:05:41Z MEMBER

The key difference appears to be: - In xarray, .resample(...).interpolate(...) only interpolates over existing gaps in the data. If a value is already marked as NaN, it doesn't get interpolated. - In pandas, .resample(...).interpolate(...) fills in existing NaNs.

I think this is a bug in pandas, since the behavior is inconsistent with other resample methods like ffill(): ```

s.reindex_like(slike).resample('1D').ffill() time 2016-01-01 NaN 2016-01-02 0.0 2016-01-03 1.0 2016-01-04 2.0 2016-01-05 3.0 2016-01-06 NaN 2016-01-07 4.0 2016-01-08 5.0 2016-01-09 6.0 2016-01-10 7.0 2016-01-11 8.0 2016-01-12 9.0 2016-01-13 10.0 2016-01-14 NaN 2016-01-15 NaN Freq: D, dtype: float32 ```

More generally: resample() exists for resampling existing values, not filling in missing values. If you want to fill in values that are already NaN, you should use one of the existing filling methods (e.g., fillna() or interpolate()). Or you can drop this filling values with .dropna().

(This does suggest that xarray could use a direct DataArray.interpolate() method.)

Another example: ```

s.reindex_like(slike).resample('12H').ffill() time 2016-01-01 00:00:00 NaN 2016-01-01 12:00:00 NaN 2016-01-02 00:00:00 0.0 2016-01-02 12:00:00 0.0 2016-01-03 00:00:00 1.0 2016-01-03 12:00:00 1.0 2016-01-04 00:00:00 2.0 2016-01-04 12:00:00 2.0 2016-01-05 00:00:00 3.0 2016-01-05 12:00:00 3.0 2016-01-06 00:00:00 NaN 2016-01-06 12:00:00 NaN 2016-01-07 00:00:00 4.0 2016-01-07 12:00:00 4.0 2016-01-08 00:00:00 5.0 2016-01-08 12:00:00 5.0 2016-01-09 00:00:00 6.0 2016-01-09 12:00:00 6.0 2016-01-10 00:00:00 7.0 2016-01-10 12:00:00 7.0 2016-01-11 00:00:00 8.0 2016-01-11 12:00:00 8.0 2016-01-12 00:00:00 9.0 2016-01-12 12:00:00 9.0 2016-01-13 00:00:00 10.0 2016-01-13 12:00:00 10.0 2016-01-14 00:00:00 NaN 2016-01-14 12:00:00 NaN 2016-01-15 00:00:00 NaN Freq: 12H, dtype: float32 ```

It is useful that pandas's upsampling is only repeating values within the previously valid range. Otherwise it is likely to interpolate over true data gaps.

As another use-case: suppose we have a temperature dataset with 3 hourly measurements, and we want to upsample it to 1 hour resolution. Occasionally, measurements are missing for day(s) at a time, which we mark with missing values (suppose the server running the model ran out of disk space). It is useful to be able to resample to a higher resolution without entirely unrealistic interpolation over data gaps.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Resample / upsample behavior diverges from pandas  265056503

Advanced export

JSON shape: default, array, newline-delimited, object

CSV options:

CREATE TABLE [issue_comments] (
   [html_url] TEXT,
   [issue_url] TEXT,
   [id] INTEGER PRIMARY KEY,
   [node_id] TEXT,
   [user] INTEGER REFERENCES [users]([id]),
   [created_at] TEXT,
   [updated_at] TEXT,
   [author_association] TEXT,
   [body] TEXT,
   [reactions] TEXT,
   [performed_via_github_app] TEXT,
   [issue] INTEGER REFERENCES [issues]([id])
);
CREATE INDEX [idx_issue_comments_issue]
    ON [issue_comments] ([issue]);
CREATE INDEX [idx_issue_comments_user]
    ON [issue_comments] ([user]);
Powered by Datasette · Queries took 25.978ms · About: xarray-datasette