home / github / issue_comments

Menu
  • GraphQL API
  • Search all tables

issue_comments: 345300165

This data as json

html_url issue_url id node_id user created_at updated_at author_association body reactions performed_via_github_app issue
https://github.com/pydata/xarray/issues/1725#issuecomment-345300165 https://api.github.com/repos/pydata/xarray/issues/1725 345300165 MDEyOklzc3VlQ29tbWVudDM0NTMwMDE2NQ== 1217238 2017-11-17T16:55:38Z 2017-11-17T16:55:38Z MEMBER

This comment has the full context: https://github.com/pydata/xarray/issues/1372#issuecomment-293748654. To repeat myself:


You might ask why this separate lazy compute machinery exists. The answer is that dask fails to optimize element-wise operations like (scale * array)[subset] -> scale * array[subset], which is a critical optimization for lazy decoding of large datasets.

See https://github.com/dask/dask/issues/746 for discussion and links to PRs about this. jcrist had a solution that worked, but it slowed down every dask array operations by 20%, which wasn't a great win.

I wonder if this is worth revisiting with a simpler, less general optimization pass that doesn't bother with broadcasting. See the subclasses of NDArrayMixin in xarray/conventions.py for examples of the sorts of functionality we need: - Casting (e.g., array.astype(bool)). - Chained arithmetic with scalars (e.g., 0.5 + 0.5 * array). - Custom element-wise operations (e.g., map_blocks(convert_to_datetime64, array, dtype=np.datetime64)) - Custom aggregations that drop a dimension (e.g., map_blocks(characters_to_string, array, drop_axis=-1))

If we could optimize all these operations (and ideally chain them), then we could drop all the lazy loading stuff from xarray in favor of dask, which would be a real win.


The downside of this switch is that lazy loading of data from disk would now require dask, which would be at least slightly annoying to some users. But it's probably worth the tradeoff from a maintainability perspective, and also to fix issues like #1372.

{
    "total_count": 1,
    "+1": 1,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  274797981
Powered by Datasette · Queries took 1.132ms · About: xarray-datasette