home / github / issue_comments

Menu
  • GraphQL API
  • Search all tables

issue_comments: 336578729

This data as json

html_url issue_url id node_id user created_at updated_at author_association body reactions performed_via_github_app issue
https://github.com/pydata/xarray/issues/1631#issuecomment-336578729 https://api.github.com/repos/pydata/xarray/issues/1631 336578729 MDEyOklzc3VlQ29tbWVudDMzNjU3ODcyOQ== 1217238 2017-10-13T22:05:41Z 2017-10-13T22:05:41Z MEMBER

The key difference appears to be: - In xarray, .resample(...).interpolate(...) only interpolates over existing gaps in the data. If a value is already marked as NaN, it doesn't get interpolated. - In pandas, .resample(...).interpolate(...) fills in existing NaNs.

I think this is a bug in pandas, since the behavior is inconsistent with other resample methods like ffill(): ```

s.reindex_like(slike).resample('1D').ffill() time 2016-01-01 NaN 2016-01-02 0.0 2016-01-03 1.0 2016-01-04 2.0 2016-01-05 3.0 2016-01-06 NaN 2016-01-07 4.0 2016-01-08 5.0 2016-01-09 6.0 2016-01-10 7.0 2016-01-11 8.0 2016-01-12 9.0 2016-01-13 10.0 2016-01-14 NaN 2016-01-15 NaN Freq: D, dtype: float32 ```

More generally: resample() exists for resampling existing values, not filling in missing values. If you want to fill in values that are already NaN, you should use one of the existing filling methods (e.g., fillna() or interpolate()). Or you can drop this filling values with .dropna().

(This does suggest that xarray could use a direct DataArray.interpolate() method.)

Another example: ```

s.reindex_like(slike).resample('12H').ffill() time 2016-01-01 00:00:00 NaN 2016-01-01 12:00:00 NaN 2016-01-02 00:00:00 0.0 2016-01-02 12:00:00 0.0 2016-01-03 00:00:00 1.0 2016-01-03 12:00:00 1.0 2016-01-04 00:00:00 2.0 2016-01-04 12:00:00 2.0 2016-01-05 00:00:00 3.0 2016-01-05 12:00:00 3.0 2016-01-06 00:00:00 NaN 2016-01-06 12:00:00 NaN 2016-01-07 00:00:00 4.0 2016-01-07 12:00:00 4.0 2016-01-08 00:00:00 5.0 2016-01-08 12:00:00 5.0 2016-01-09 00:00:00 6.0 2016-01-09 12:00:00 6.0 2016-01-10 00:00:00 7.0 2016-01-10 12:00:00 7.0 2016-01-11 00:00:00 8.0 2016-01-11 12:00:00 8.0 2016-01-12 00:00:00 9.0 2016-01-12 12:00:00 9.0 2016-01-13 00:00:00 10.0 2016-01-13 12:00:00 10.0 2016-01-14 00:00:00 NaN 2016-01-14 12:00:00 NaN 2016-01-15 00:00:00 NaN Freq: 12H, dtype: float32 ```

It is useful that pandas's upsampling is only repeating values within the previously valid range. Otherwise it is likely to interpolate over true data gaps.

As another use-case: suppose we have a temperature dataset with 3 hourly measurements, and we want to upsample it to 1 hour resolution. Occasionally, measurements are missing for day(s) at a time, which we mark with missing values (suppose the server running the model ran out of disk space). It is useful to be able to resample to a higher resolution without entirely unrealistic interpolation over data gaps.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  265056503
Powered by Datasette · Queries took 82.184ms · About: xarray-datasette