home / github

Menu
  • GraphQL API
  • Search all tables

issue_comments

Table actions
  • GraphQL API for issue_comments

4 rows where issue = 208903781 and user = 1217238 sorted by updated_at descending

✎ View and edit SQL

This data as json, CSV (advanced)

Suggested facets: created_at (date), updated_at (date)

user 1

  • shoyer · 4 ✖

issue 1

  • Rolling window operation does not work with dask arrays · 4 ✖

author_association 1

  • MEMBER 4
id html_url issue_url node_id user created_at updated_at ▲ author_association body reactions performed_via_github_app issue
328315251 https://github.com/pydata/xarray/issues/1279#issuecomment-328315251 https://api.github.com/repos/pydata/xarray/issues/1279 MDEyOklzc3VlQ29tbWVudDMyODMxNTI1MQ== shoyer 1217238 2017-09-10T02:24:22Z 2017-09-10T02:24:22Z MEMBER

@darothen Can you give an example of typical shape and chunks for your data when you load it with dask?

My sense is that we would do better to keep everything in the form of (dask) arrays, rather than converting into dataframes. For the highest performance, I would make a dask array routine that combines ghosting, map blocks and bottleneck's rolling window functions. Then it should be straightforward into rolling in place of the existing bottleneck routine.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Rolling window operation does not work with dask arrays 208903781
302137119 https://github.com/pydata/xarray/issues/1279#issuecomment-302137119 https://api.github.com/repos/pydata/xarray/issues/1279 MDEyOklzc3VlQ29tbWVudDMwMjEzNzExOQ== shoyer 1217238 2017-05-17T15:59:58Z 2017-05-17T15:59:58Z MEMBER

@darothen we would need to add xarray -> dask dataframe conversion functions, which don't currently exist. Otherwise I think we would still need to rewrite this (but of course the dataframe implementation could be a useful reference point).

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Rolling window operation does not work with dask arrays 208903781
284133376 https://github.com/pydata/xarray/issues/1279#issuecomment-284133376 https://api.github.com/repos/pydata/xarray/issues/1279 MDEyOklzc3VlQ29tbWVudDI4NDEzMzM3Ng== shoyer 1217238 2017-03-04T07:06:25Z 2017-03-04T07:06:25Z MEMBER

An idea...since we only have 1-D rolling methods in xarray, couldn't we just use map_blocks with numpy/bottleneck functions when the rolling dimension is completely contained in a dask chunk?

Yes, that would work for such cases.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Rolling window operation does not work with dask arrays 208903781
281185199 https://github.com/pydata/xarray/issues/1279#issuecomment-281185199 https://api.github.com/repos/pydata/xarray/issues/1279 MDEyOklzc3VlQ29tbWVudDI4MTE4NTE5OQ== shoyer 1217238 2017-02-20T21:28:37Z 2017-02-20T21:28:37Z MEMBER

Note that I was able to apply the rolling window by converting my variable to a pandas series with to_series(). I then could use panda's own rolling window methods. I guess that when converting to a pandas series the dask array is read in memory?

Yes, this is correct -- we automatically compute dask arrays when converting to pandas, because pandas does not have any notion of lazy arrays.

Note that we currently have two versions of rolling window operations:

  1. Implemented with bottleneck. These are fast, but only work in memory. Something like ghost cells would be necessary to extend them to dask.
  2. Implemented with a nested loop written in Python. These are much slower, both because of the algorithm (time O(dim_size * window_size) instead of time O(dim_size)) and implementation of the inner loop in Python instead of C, but there's no fundamental reason why they shouldn't be able to work for dask arrays basically as is.
{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Rolling window operation does not work with dask arrays 208903781

Advanced export

JSON shape: default, array, newline-delimited, object

CSV options:

CREATE TABLE [issue_comments] (
   [html_url] TEXT,
   [issue_url] TEXT,
   [id] INTEGER PRIMARY KEY,
   [node_id] TEXT,
   [user] INTEGER REFERENCES [users]([id]),
   [created_at] TEXT,
   [updated_at] TEXT,
   [author_association] TEXT,
   [body] TEXT,
   [reactions] TEXT,
   [performed_via_github_app] TEXT,
   [issue] INTEGER REFERENCES [issues]([id])
);
CREATE INDEX [idx_issue_comments_issue]
    ON [issue_comments] ([issue]);
CREATE INDEX [idx_issue_comments_user]
    ON [issue_comments] ([user]);
Powered by Datasette · Queries took 166.176ms · About: xarray-datasette