home / github

Menu
  • Search all tables
  • GraphQL API

issue_comments

Table actions
  • GraphQL API for issue_comments

3 rows where author_association = "NONE" and issue = 208903781 sorted by updated_at descending

✎ View and edit SQL

This data as json, CSV (advanced)

Suggested facets: created_at (date), updated_at (date)

user 1

  • darothen 3

issue 1

  • Rolling window operation does not work with dask arrays · 3 ✖

author_association 1

  • NONE · 3 ✖
id html_url issue_url node_id user created_at updated_at ▲ author_association body reactions performed_via_github_app issue
328724595 https://github.com/pydata/xarray/issues/1279#issuecomment-328724595 https://api.github.com/repos/pydata/xarray/issues/1279 MDEyOklzc3VlQ29tbWVudDMyODcyNDU5NQ== darothen 4992424 2017-09-12T03:29:29Z 2017-09-12T03:29:29Z NONE

@shoyer - This output is usually provided as a sequence of daily netCDF files, each on a ~2 degree global grid with 24 timesteps per file (so shape 24 x 96 x 144). For convenience, I usually concatenate these files into yearly datasets, so they'll have a shape (8736 x 96 x 144). I haven't played too much with how to chunk the data, but it's not uncommon for me to load 20-50 of these files simultaneously (each holding a years worth of data) and treat each year as an "ensemble member dimension, so my data has shape (50 x 8736 x 96 x 144). Yes, keeping everything in dask array land is preferable, I suppose.

@jhamman - Wow, that worked pretty much perfectly! There's a handful of typos (you switch from "a" to "x" halfway through), and there's a lot of room for optimization by chunksize. But it just works, which is absolutely ridiculous. I just pushed a ~200 GB dataset on my cluster with ~50 cores and it screamed through the calculation.

Is there anyway this could be pushed before 0.10.0? It's a killer enhancement.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Rolling window operation does not work with dask arrays 208903781
328314676 https://github.com/pydata/xarray/issues/1279#issuecomment-328314676 https://api.github.com/repos/pydata/xarray/issues/1279 MDEyOklzc3VlQ29tbWVudDMyODMxNDY3Ng== darothen 4992424 2017-09-10T02:04:33Z 2017-09-10T02:04:33Z NONE

In light of #1489 is there a way to move forward here with rolling on dask-backed data structures?

In soliciting the atmospheric chemistry community for a few illustrative examples for gcpy, it's become apparent that indices computed from re-sampled timeseries would be killer, attention-grabbing functionality. For instance, the EPA air quality standard we use for ozone involves taking hourly data, computing 8-hour rolling means for each day of your dataset, and then picking the maximum of those means for each day ("MDA8 ozone"). Similar metrics exist for other pollutants.

With traditional xarray data-structures, it's trivial to compute this quantity (assuming we have hourly data and using the new resample API from #1272):

python ds = xr.open_dataset("hourly_ozone_data.nc") mda8_o3 = ( ds['O3'] .rolling(time=8, min_periods=6) .mean('time') .resample(time='D').max() ) There's one quirk relating to timestamp the rolling data (by default rolling uses the last timestamp in a dataset, where in my application I want to label data with the first one) which makes that chained method a bit impractical, but it only adds like one line of code and it is totally dask-friendly.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Rolling window operation does not work with dask arrays 208903781
301489242 https://github.com/pydata/xarray/issues/1279#issuecomment-301489242 https://api.github.com/repos/pydata/xarray/issues/1279 MDEyOklzc3VlQ29tbWVudDMwMTQ4OTI0Mg== darothen 4992424 2017-05-15T14:18:55Z 2017-05-15T14:18:55Z NONE

Dask dataframes have recently been updated so that rolling operations work (dask/dask#2198). Does this open a pathway to enable rolling on dask arrays within xarray?

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Rolling window operation does not work with dask arrays 208903781

Advanced export

JSON shape: default, array, newline-delimited, object

CSV options:

CREATE TABLE [issue_comments] (
   [html_url] TEXT,
   [issue_url] TEXT,
   [id] INTEGER PRIMARY KEY,
   [node_id] TEXT,
   [user] INTEGER REFERENCES [users]([id]),
   [created_at] TEXT,
   [updated_at] TEXT,
   [author_association] TEXT,
   [body] TEXT,
   [reactions] TEXT,
   [performed_via_github_app] TEXT,
   [issue] INTEGER REFERENCES [issues]([id])
);
CREATE INDEX [idx_issue_comments_issue]
    ON [issue_comments] ([issue]);
CREATE INDEX [idx_issue_comments_user]
    ON [issue_comments] ([user]);
Powered by Datasette · Queries took 12.898ms · About: xarray-datasette