home / github

Menu
  • Search all tables
  • GraphQL API

issue_comments

Table actions
  • GraphQL API for issue_comments

4 rows where author_association = "MEMBER", issue = 274797981 and user = 1217238 sorted by updated_at descending

✎ View and edit SQL

This data as json, CSV (advanced)

Suggested facets: reactions

These facets timed out: issue

user 1

  • shoyer · 4 ✖

author_association 1

  • MEMBER · 4 ✖
id html_url issue_url node_id user created_at updated_at ▲ author_association body reactions performed_via_github_app issue
406105313 https://github.com/pydata/xarray/issues/1725#issuecomment-406105313 https://api.github.com/repos/pydata/xarray/issues/1725 MDEyOklzc3VlQ29tbWVudDQwNjEwNTMxMw== shoyer 1217238 2018-07-18T23:28:28Z 2018-07-18T23:28:28Z MEMBER

On a somewhat related note, I am now proposing extending xarray's "lazy array" functionality to include limited support for arithmetic, without necessarily using dask: https://github.com/pydata/xarray/issues/2298

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Switch our lazy array classes to use Dask instead? 274797981
345463291 https://github.com/pydata/xarray/issues/1725#issuecomment-345463291 https://api.github.com/repos/pydata/xarray/issues/1725 MDEyOklzc3VlQ29tbWVudDM0NTQ2MzI5MQ== shoyer 1217238 2017-11-18T19:00:59Z 2017-11-18T19:00:59Z MEMBER

@rabernat actually in #1532 we switched to not displaying a preview of any lazily loaded data on disk -- even if it isn't loaded with Dask. (I was not sure about this change, but I was alone in my reservations.)

I do agree that our lazy arrays serve a useful purpose currently. I would only consider removing them if we can improve Dask so it works just as well for this use case.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Switch our lazy array classes to use Dask instead? 274797981
345345971 https://github.com/pydata/xarray/issues/1725#issuecomment-345345971 https://api.github.com/repos/pydata/xarray/issues/1725 MDEyOklzc3VlQ29tbWVudDM0NTM0NTk3MQ== shoyer 1217238 2017-11-17T19:39:35Z 2017-11-17T19:39:35Z MEMBER

Yeah, we could solve this by making dask a requirement only if you want load netCDF files and/or load netCDF files lazily.

Potentially chunks=False in open_dataset could indicate that you're OK loading everything into memory with NumPy. We would then have to choose between making the default use Dask or NumPy.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Switch our lazy array classes to use Dask instead? 274797981
345300165 https://github.com/pydata/xarray/issues/1725#issuecomment-345300165 https://api.github.com/repos/pydata/xarray/issues/1725 MDEyOklzc3VlQ29tbWVudDM0NTMwMDE2NQ== shoyer 1217238 2017-11-17T16:55:38Z 2017-11-17T16:55:38Z MEMBER

This comment has the full context: https://github.com/pydata/xarray/issues/1372#issuecomment-293748654. To repeat myself:


You might ask why this separate lazy compute machinery exists. The answer is that dask fails to optimize element-wise operations like (scale * array)[subset] -> scale * array[subset], which is a critical optimization for lazy decoding of large datasets.

See https://github.com/dask/dask/issues/746 for discussion and links to PRs about this. jcrist had a solution that worked, but it slowed down every dask array operations by 20%, which wasn't a great win.

I wonder if this is worth revisiting with a simpler, less general optimization pass that doesn't bother with broadcasting. See the subclasses of NDArrayMixin in xarray/conventions.py for examples of the sorts of functionality we need: - Casting (e.g., array.astype(bool)). - Chained arithmetic with scalars (e.g., 0.5 + 0.5 * array). - Custom element-wise operations (e.g., map_blocks(convert_to_datetime64, array, dtype=np.datetime64)) - Custom aggregations that drop a dimension (e.g., map_blocks(characters_to_string, array, drop_axis=-1))

If we could optimize all these operations (and ideally chain them), then we could drop all the lazy loading stuff from xarray in favor of dask, which would be a real win.


The downside of this switch is that lazy loading of data from disk would now require dask, which would be at least slightly annoying to some users. But it's probably worth the tradeoff from a maintainability perspective, and also to fix issues like #1372.

{
    "total_count": 1,
    "+1": 1,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Switch our lazy array classes to use Dask instead? 274797981

Advanced export

JSON shape: default, array, newline-delimited, object

CSV options:

CREATE TABLE [issue_comments] (
   [html_url] TEXT,
   [issue_url] TEXT,
   [id] INTEGER PRIMARY KEY,
   [node_id] TEXT,
   [user] INTEGER REFERENCES [users]([id]),
   [created_at] TEXT,
   [updated_at] TEXT,
   [author_association] TEXT,
   [body] TEXT,
   [reactions] TEXT,
   [performed_via_github_app] TEXT,
   [issue] INTEGER REFERENCES [issues]([id])
);
CREATE INDEX [idx_issue_comments_issue]
    ON [issue_comments] ([issue]);
CREATE INDEX [idx_issue_comments_user]
    ON [issue_comments] ([user]);
Powered by Datasette · Queries took 2582.083ms · About: xarray-datasette