home / github / issue_comments

Menu
  • GraphQL API
  • Search all tables

issue_comments: 274375810

This data as json

html_url issue_url id node_id user created_at updated_at author_association body reactions performed_via_github_app issue
https://github.com/pydata/xarray/issues/1224#issuecomment-274375810 https://api.github.com/repos/pydata/xarray/issues/1224 274375810 MDEyOklzc3VlQ29tbWVudDI3NDM3NTgxMA== 1217238 2017-01-23T01:09:49Z 2017-01-23T01:09:49Z MEMBER

Interesting -- thanks for sharing! I am interested in performance improvements but also a little reluctant to add in specialized optimizations directly into xarray.

You write that this is equivalent to sum(a * w for a, w in zip(arrays, weights)). How does this compare to stacking doing the sum in xarray, e.g.,(arrays * weights).sum('stacked'), where arrays and weights are now DataArray objects with a 'stacked' dimension? Or maybe arrays.dot(weights)?

Using vectorized operations feels a bit more idiomatic (though also maybe more verbose). It also may be more performant. Note that the builtin sum is not optimized well by dask because it's basically equivalent to a loop: def sum(xs): result = 0 for x in xs: result += x return result In contrast, dask.array.sum builds up a tree so it can do the sum in parallel.

There have also been discussion in https://github.com/pydata/xarray/issues/422 about adding a dedicated method for weighted mean.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  202423683
Powered by Datasette · Queries took 155.65ms · About: xarray-datasette