home / github / issue_comments

Menu
  • GraphQL API
  • Search all tables

issue_comments: 328314676

This data as json

html_url issue_url id node_id user created_at updated_at author_association body reactions performed_via_github_app issue
https://github.com/pydata/xarray/issues/1279#issuecomment-328314676 https://api.github.com/repos/pydata/xarray/issues/1279 328314676 MDEyOklzc3VlQ29tbWVudDMyODMxNDY3Ng== 4992424 2017-09-10T02:04:33Z 2017-09-10T02:04:33Z NONE

In light of #1489 is there a way to move forward here with rolling on dask-backed data structures?

In soliciting the atmospheric chemistry community for a few illustrative examples for gcpy, it's become apparent that indices computed from re-sampled timeseries would be killer, attention-grabbing functionality. For instance, the EPA air quality standard we use for ozone involves taking hourly data, computing 8-hour rolling means for each day of your dataset, and then picking the maximum of those means for each day ("MDA8 ozone"). Similar metrics exist for other pollutants.

With traditional xarray data-structures, it's trivial to compute this quantity (assuming we have hourly data and using the new resample API from #1272):

python ds = xr.open_dataset("hourly_ozone_data.nc") mda8_o3 = ( ds['O3'] .rolling(time=8, min_periods=6) .mean('time') .resample(time='D').max() ) There's one quirk relating to timestamp the rolling data (by default rolling uses the last timestamp in a dataset, where in my application I want to label data with the first one) which makes that chained method a bit impractical, but it only adds like one line of code and it is totally dask-friendly.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  208903781
Powered by Datasette · Queries took 157.596ms · About: xarray-datasette