home / github / issues

Menu
  • GraphQL API
  • Search all tables

issues: 484098286

This data as json

id node_id number title user state locked assignee milestone comments created_at updated_at closed_at author_association active_lock_reason draft pull_request body reactions performed_via_github_app state_reason repo type
484098286 MDU6SXNzdWU0ODQwOTgyODY= 3242 An `asfreq` method without `resample`, and clarify or improve resample().asfreq() behavior for down-sampling 463809 open 0     2 2019-08-22T16:33:32Z 2022-04-18T16:01:07Z   CONTRIBUTOR      

MCVE Code Sample

```python

Your code here

import numpy as np import xarray as xr import pandas as pd

data = np.random.random(300)

Make a time grid that doesn't start exactly on the hour.

time = pd.date_range('2019-01-01', periods=300, freq='T') + pd.Timedelta('3T') time DatetimeIndex(['2019-01-01 00:03:00', '2019-01-01 00:04:00', '2019-01-01 00:05:00', '2019-01-01 00:06:00', '2019-01-01 00:07:00', '2019-01-01 00:08:00', '2019-01-01 00:09:00', '2019-01-01 00:10:00', '2019-01-01 00:11:00', '2019-01-01 00:12:00', ... '2019-01-01 04:53:00', '2019-01-01 04:54:00', '2019-01-01 04:55:00', '2019-01-01 04:56:00', '2019-01-01 04:57:00', '2019-01-01 04:58:00', '2019-01-01 04:59:00', '2019-01-01 05:00:00', '2019-01-01 05:01:00', '2019-01-01 05:02:00'], dtype='datetime64[ns]', length=300, freq='T')

da = xr.DataArray(data, dims=['time'], coords={'time': time}) resampled = da.resample(time='H').asfreq() resampled <xarray.DataArray (time: 6)> array([0.478601, 0.488425, 0.496322, 0.479256, 0.523395, 0.201718]) Coordinates: * time (time) datetime64[ns] 2019-01-01 ... 2019-01-01T05:00:00

The value is actually the mean over the time window, eg. the third value is:

da.loc['2019-01-01T02:00:00':'2019-01-01T02:59:00'].mean() <xarray.DataArray ()> array(0.496322) ```

Expected Output

Docs say this: Return values of original object at the new up-sampling frequency; essentially a re-index with new times set to NaN.

I suppose this doc is not technically wrong, since upon careful reading, I realize it does not define a behavior for down-sampling. But it's easy to: (1) assume the same behavior (reindexing) for down-sampling and up-sampling and/or (2) expect behavior similar to df.asfreq() in pandas.

Problem Description

I would argue for an asfreq method without resampling that matches the pandas behavior, which AFAIK, is to reindex starting at the first timestamp, at the specified interval.

```

df = pd.DataFrame(da, index=time) df.asfreq('H') 0 2019-01-01 00:03:00 0.065304 2019-01-01 01:03:00 0.325814 2019-01-01 02:03:00 0.841201 2019-01-01 03:03:00 0.610266 2019-01-01 04:03:00 0.613906 ```

This can currently easily be achieved, so it's not a blocker. ```

da.reindex(time=pd.date_range(da.time[0].values, da.time[-1].values, freq='H')) <xarray.DataArray (time: 5)> array([0.065304, 0.325814, 0.841201, 0.610266, 0.613906]) Coordinates: * time (time) datetime64[ns] 2019-01-01T00:03:00 ... 2019-01-01T04:03:00 ```

Why I argue for asfreq functionality outside of resampling is that asfreq(freq) in pandas is purely a reindex, compared to eg resample(freq).first() which would give you a different time index.

Output of xr.show_versions()

Still on python27, show_versions actually throws an exception, because some HDF5 library doesn't have a magic property. I don't think this detail is relevant here though.

``` >>> xr.__version__ u'0.11.3' ```
{
    "url": "https://api.github.com/repos/pydata/xarray/issues/3242/reactions",
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
    13221727 issue

Links from other tables

  • 1 row from issues_id in issues_labels
  • 2 rows from issue in issue_comments
Powered by Datasette · Queries took 0.589ms · About: xarray-datasette