home / github / issues

Menu
  • GraphQL API
  • Search all tables

issues: 401392318

This data as json

id node_id number title user state locked assignee milestone comments created_at updated_at closed_at author_association active_lock_reason draft pull_request body reactions performed_via_github_app state_reason repo type
401392318 MDU6SXNzdWU0MDEzOTIzMTg= 2695 Resample with limit/tolerance 43613877 closed 0     3 2019-01-21T15:04:30Z 2019-01-31T17:28:09Z 2019-01-31T17:28:09Z CONTRIBUTOR      

Upsampling methods cannot be limited

It is comes very handy to limit the scope of the resample method e.g. nearest in time series. In pandas the limit argument can be given, such that:

```python import pandas as pd import datetime as dt

dates=[dt.datetime(2018,1,1), dt.datetime(2018,1,2)] data=[10,20] df=pd.DataFrame(data,index=dates) df.resample('1H').nearest(limit=1) ```

This leads to 2018-01-01 00:00:00 10.0 2018-01-01 01:00:00 10.0 2018-01-01 02:00:00 NaN 2018-01-01 03:00:00 NaN 2018-01-01 04:00:00 NaN ... 2018-01-01 20:00:00 NaN 2018-01-01 21:00:00 NaN 2018-01-01 22:00:00 NaN 2018-01-01 23:00:00 20.0 2018-01-02 00:00:00 20.0

Currently: python import xarray as xr xdf = xr.Dataset.from_dataframe(df) xdf.resample({'index':'1H'}).nearest(limit=1) leads to Traceback (most recent call last): File "<stdin>", line 1, in <module> TypeError: nearest() got an unexpected keyword argument 'limit'

Problem description

This is very helpful, as one might not want to fill gaps with the nearest method indefinitely. To my understanding the following modifications might be made by comparisions to the pandas code:

/xarray/core/resample.py python def _upsample(self, method, limit=None, *args, **kwargs): ... elif method in ['pad', 'ffill', 'backfill', 'bfill', 'nearest']: kwargs = kwargs.copy() kwargs.update(**{self._dim: upsampled_index}) return self._obj.reindex(method=method, tolerance=limit, *args, **kwargs) ...

and python def nearest(self, limit=None): """Take new values from nearest original coordinate to up-sampled frequency coordinates. """ return self._upsample('nearest',limit=limit)

So I think, with the tolerance keyword, reindex supports already the limit, but it just hasn't been forwarded to the _upsample and nearest methods.

Current Output

```python import xarray as xr

xdf = xr.Dataset.from_dataframe(df) xdf.resample({'index':'1H'}).nearest() <xarray.Dataset> Dimensions: (index: 25) Coordinates: * index (index) datetime64[ns] 2018-01-01 ... 2018-01-02 Data variables: 0 (index) int64 10 10 10 10 10 10 10 10 ... 20 20 20 20 20 20 20 20 ```

However, it would be nice, if the following would work: ```python xdf.resample({'index':'1H'}).nearest(limit=1)

<xarray.Dataset> Dimensions: (index: 25) Coordinates: * index (index) datetime64[ns] 2018-01-01 ... 2018-01-02 Data variables: 0 (index) float64 10.0 10.0 nan nan nan nan ... nan nan nan 20.0 20.0 ```

{
    "url": "https://api.github.com/repos/pydata/xarray/issues/2695/reactions",
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  completed 13221727 issue

Links from other tables

  • 0 rows from issues_id in issues_labels
  • 3 rows from issue in issue_comments
Powered by Datasette · Queries took 478.97ms · About: xarray-datasette