github: issue_comments: 12 rows where issue = 208903781 sorted by updated

12 rows where issue = 208903781 sorted by updated_at descending

Search:

descending

id	html_url	issue_url	node_id	user	created_at	updated_at ▲	author_association	body	reactions	issue
328731021	https://github.com/pydata/xarray/issues/1279#issuecomment-328731021	https://api.github.com/repos/pydata/xarray/issues/1279	MDEyOklzc3VlQ29tbWVudDMyODczMTAyMQ==	jhamman 2443309	2017-09-12T04:13:37Z	2017-09-12T04:13:37Z	MEMBER	see #1568 for PR that adds this	{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	Rolling window operation does not work with dask arrays 208903781
328690191	https://github.com/pydata/xarray/issues/1279#issuecomment-328690191	https://api.github.com/repos/pydata/xarray/issues/1279	MDEyOklzc3VlQ29tbWVudDMyODY5MDE5MQ==	jhamman 2443309	2017-09-11T23:48:58Z	2017-09-12T04:13:15Z	MEMBER	@darothen and @shoyer - Here's a little wrapper function that does the dask and bottleneck piece... Python def dask_rolling_wrapper(moving_func, a, window, min_count=None, axis=-1): '''wrapper to apply bottleneck moving window funcs on dask arrays''' # inputs for ghost if axis < 0: axis = a.ndim + axis depth = {d: 0 for d in range(a.ndim)} depth[axis] = window - 1 boundary = {d: np.nan for d in range(a.ndim)} # create ghosted arrays ag = da.ghost.ghost(a, depth=depth, boundary=boundary) # apply rolling func out = ag.map_blocks(moving_func, window, min_count=min_count, axis=axis, dtype=a.dtype) # trim array result = da.ghost.trim_internal(out, depth) return result I don't think this would be all that difficult to drop into our current `Rolling` class.	{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	Rolling window operation does not work with dask arrays 208903781
328724745	https://github.com/pydata/xarray/issues/1279#issuecomment-328724745	https://api.github.com/repos/pydata/xarray/issues/1279	MDEyOklzc3VlQ29tbWVudDMyODcyNDc0NQ==	jhamman 2443309	2017-09-12T03:30:20Z	2017-09-12T03:30:20Z	MEMBER	@darothen - I'll open a PR in a few minutes. I'll fix the typos.	{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	Rolling window operation does not work with dask arrays 208903781
328724595	https://github.com/pydata/xarray/issues/1279#issuecomment-328724595	https://api.github.com/repos/pydata/xarray/issues/1279	MDEyOklzc3VlQ29tbWVudDMyODcyNDU5NQ==	darothen 4992424	2017-09-12T03:29:29Z	2017-09-12T03:29:29Z	NONE	@shoyer - This output is usually provided as a sequence of daily netCDF files, each on a ~2 degree global grid with 24 timesteps per file (so shape 24 x 96 x 144). For convenience, I usually concatenate these files into yearly datasets, so they'll have a shape (8736 x 96 x 144). I haven't played too much with how to chunk the data, but it's not uncommon for me to load 20-50 of these files simultaneously (each holding a years worth of data) and treat each year as an "ensemble member dimension, so my data has shape (50 x 8736 x 96 x 144). Yes, keeping everything in dask array land is preferable, I suppose. @jhamman - Wow, that worked pretty much perfectly! There's a handful of typos (you switch from "a" to "x" halfway through), and there's a lot of room for optimization by chunksize. But it just works, which is absolutely ridiculous. I just pushed a ~200 GB dataset on my cluster with ~50 cores and it screamed through the calculation. Is there anyway this could be pushed before 0.10.0? It's a killer enhancement.	{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	Rolling window operation does not work with dask arrays 208903781
328315251	https://github.com/pydata/xarray/issues/1279#issuecomment-328315251	https://api.github.com/repos/pydata/xarray/issues/1279	MDEyOklzc3VlQ29tbWVudDMyODMxNTI1MQ==	shoyer 1217238	2017-09-10T02:24:22Z	2017-09-10T02:24:22Z	MEMBER	@darothen Can you give an example of typical `shape` and `chunks` for your data when you load it with dask? My sense is that we would do better to keep everything in the form of (dask) arrays, rather than converting into dataframes. For the highest performance, I would make a dask array routine that combines ghosting, map blocks and bottleneck's rolling window functions. Then it should be straightforward into rolling in place of the existing bottleneck routine.	{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	Rolling window operation does not work with dask arrays 208903781
328314676	https://github.com/pydata/xarray/issues/1279#issuecomment-328314676	https://api.github.com/repos/pydata/xarray/issues/1279	MDEyOklzc3VlQ29tbWVudDMyODMxNDY3Ng==	darothen 4992424	2017-09-10T02:04:33Z	2017-09-10T02:04:33Z	NONE	In light of #1489 is there a way to move forward here with `rolling` on `dask`-backed data structures? In soliciting the atmospheric chemistry community for a few illustrative examples for gcpy, it's become apparent that indices computed from re-sampled timeseries would be killer, attention-grabbing functionality. For instance, the EPA air quality standard we use for ozone involves taking hourly data, computing 8-hour rolling means for each day of your dataset, and then picking the maximum of those means for each day ("MDA8 ozone"). Similar metrics exist for other pollutants. With traditional xarray data-structures, it's trivial to compute this quantity (assuming we have hourly data and using the new resample API from #1272): `python ds = xr.open_dataset("hourly_ozone_data.nc") mda8_o3 = ( ds['O3'] .rolling(time=8, min_periods=6) .mean('time') .resample(time='D').max() )` There's one quirk relating to timestamp the rolling data (by default `rolling` uses the last timestamp in a dataset, where in my application I want to label data with the first one) which makes that chained method a bit impractical, but it only adds like one line of code and it is totally dask-friendly.	{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	Rolling window operation does not work with dask arrays 208903781
302137119	https://github.com/pydata/xarray/issues/1279#issuecomment-302137119	https://api.github.com/repos/pydata/xarray/issues/1279	MDEyOklzc3VlQ29tbWVudDMwMjEzNzExOQ==	shoyer 1217238	2017-05-17T15:59:58Z	2017-05-17T15:59:58Z	MEMBER	@darothen we would need to add xarray -> dask dataframe conversion functions, which don't currently exist. Otherwise I think we would still need to rewrite this (but of course the dataframe implementation could be a useful reference point).	{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	Rolling window operation does not work with dask arrays 208903781
301489242	https://github.com/pydata/xarray/issues/1279#issuecomment-301489242	https://api.github.com/repos/pydata/xarray/issues/1279	MDEyOklzc3VlQ29tbWVudDMwMTQ4OTI0Mg==	darothen 4992424	2017-05-15T14:18:55Z	2017-05-15T14:18:55Z	NONE	Dask dataframes have recently been updated so that rolling operations work (dask/dask#2198). Does this open a pathway to enable rolling on dask arrays within xarray?	{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	Rolling window operation does not work with dask arrays 208903781
284133376	https://github.com/pydata/xarray/issues/1279#issuecomment-284133376	https://api.github.com/repos/pydata/xarray/issues/1279	MDEyOklzc3VlQ29tbWVudDI4NDEzMzM3Ng==	shoyer 1217238	2017-03-04T07:06:25Z	2017-03-04T07:06:25Z	MEMBER	An idea...since we only have 1-D rolling methods in xarray, couldn't we just use map_blocks with numpy/bottleneck functions when the rolling dimension is completely contained in a dask chunk? Yes, that would work for such cases.	{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	Rolling window operation does not work with dask arrays 208903781
284132513	https://github.com/pydata/xarray/issues/1279#issuecomment-284132513	https://api.github.com/repos/pydata/xarray/issues/1279	MDEyOklzc3VlQ29tbWVudDI4NDEzMjUxMw==	jhamman 2443309	2017-03-04T06:45:11Z	2017-03-04T06:45:11Z	MEMBER	An idea...since we only have 1-D rolling methods in xarray, couldn't we just use `map_blocks` with numpy/bottleneck functions when the rolling dimension is completely contained in a dask chunk?	{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	Rolling window operation does not work with dask arrays 208903781
281185199	https://github.com/pydata/xarray/issues/1279#issuecomment-281185199	https://api.github.com/repos/pydata/xarray/issues/1279	MDEyOklzc3VlQ29tbWVudDI4MTE4NTE5OQ==	shoyer 1217238	2017-02-20T21:28:37Z	2017-02-20T21:28:37Z	MEMBER	Note that I was able to apply the rolling window by converting my variable to a pandas series with to_series(). I then could use panda's own rolling window methods. I guess that when converting to a pandas series the dask array is read in memory? Yes, this is correct -- we automatically compute dask arrays when converting to pandas, because pandas does not have any notion of lazy arrays. Note that we currently have two versions of rolling window operations: Implemented with bottleneck. These are fast, but only work in memory. Something like ghost cells would be necessary to extend them to dask. Implemented with a nested loop written in Python. These are much slower, both because of the algorithm (time O(dim_size * window_size) instead of time O(dim_size)) and implementation of the inner loop in Python instead of C, but there's no fundamental reason why they shouldn't be able to work for dask arrays basically as is.	{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	Rolling window operation does not work with dask arrays 208903781
281101281	https://github.com/pydata/xarray/issues/1279#issuecomment-281101281	https://api.github.com/repos/pydata/xarray/issues/1279	MDEyOklzc3VlQ29tbWVudDI4MTEwMTI4MQ==	rabernat 1197350	2017-02-20T15:01:44Z	2017-02-20T15:01:44Z	MEMBER	It seems like the most efficient way to handle this would be to use ghost cells equal to the window length.	{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	Rolling window operation does not work with dask arrays 208903781

Advanced export

JSON shape: default, array, newline-delimited, object

CREATE TABLE [issue_comments] (
   [html_url] TEXT,
   [issue_url] TEXT,
   [id] INTEGER PRIMARY KEY,
   [node_id] TEXT,
   [user] INTEGER REFERENCES [users]([id]),
   [created_at] TEXT,
   [updated_at] TEXT,
   [author_association] TEXT,
   [body] TEXT,
   [reactions] TEXT,
   [performed_via_github_app] TEXT,
   [issue] INTEGER REFERENCES [issues]([id])
);
CREATE INDEX [idx_issue_comments_issue]
    ON [issue_comments] ([issue]);
CREATE INDEX [idx_issue_comments_user]
    ON [issue_comments] ([user]);