html_url,issue_url,id,node_id,user,created_at,updated_at,author_association,body,reactions,performed_via_github_app,issue
https://github.com/pydata/xarray/issues/1224#issuecomment-519833424,https://api.github.com/repos/pydata/xarray/issues/1224,519833424,MDEyOklzc3VlQ29tbWVudDUxOTgzMzQyNA==,6213168,2019-08-09T08:36:09Z,2019-08-09T08:36:09Z,MEMBER,Retiring this as it is way too specialized for the main xarray library.,"{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,202423683
https://github.com/pydata/xarray/issues/1224#issuecomment-388545372,https://api.github.com/repos/pydata/xarray/issues/1224,388545372,MDEyOklzc3VlQ29tbWVudDM4ODU0NTM3Mg==,6213168,2018-05-12T10:22:02Z,2018-05-12T10:22:02Z,MEMBER,"Both. One of the biggest problem is that the data of my interestest is a mix of
- 1D arrays with dims=(scenario, ) and shape=(500000, ) (stressed financial instruments under a Monte Carlo stress set)
- 0D arrays with dims=() (financial instruments that are impervious to the Monte Carlo stresses and never change values)
So before you do concat(), you need to call broadcast(), which effectively means that doing the sums on your bunch of very fast 0D instruments suddendly requires repeating them on 500k points.
Even keeping the two lots separate (which is fastwsum does) performed considerably slower.
However, this was over a year ago and much before xarray.dot() and dask.einsum(), so I'll need to tinker with it again.","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,202423683
https://github.com/pydata/xarray/issues/1224#issuecomment-274380660,https://api.github.com/repos/pydata/xarray/issues/1224,274380660,MDEyOklzc3VlQ29tbWVudDI3NDM4MDY2MA==,1217238,2017-01-23T02:04:23Z,2017-01-23T02:04:23Z,MEMBER,"Was concat slow at graph construction or compute time?
On Sun, Jan 22, 2017 at 6:02 PM crusaderky wrote:
> (arrays * weights).sum('stacked') was my first attempt. It performed
> considerably worse than sum(a * w for a, w in zip(arrays, weights)) -
> mostly because xarray.concat() is not terribly performant (I did not look
> deeper into it).
>
> I did not try dask.array.sum() - worth some playing with.
>
> —
> You are receiving this because you commented.
> Reply to this email directly, view it on GitHub
> , or mute
> the thread
>
> .
>
","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,202423683
https://github.com/pydata/xarray/issues/1224#issuecomment-274380448,https://api.github.com/repos/pydata/xarray/issues/1224,274380448,MDEyOklzc3VlQ29tbWVudDI3NDM4MDQ0OA==,6213168,2017-01-23T02:02:08Z,2017-01-23T02:02:08Z,MEMBER,"``(arrays * weights).sum('stacked')`` was my first attempt. It performed considerably worse than ``sum(a * w for a, w in zip(arrays, weights))`` - mostly because xarray.concat() is not terribly performant (I did not look deeper into it).
I did not try dask.array.sum() - worth some playing with.","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,202423683
https://github.com/pydata/xarray/issues/1224#issuecomment-274375810,https://api.github.com/repos/pydata/xarray/issues/1224,274375810,MDEyOklzc3VlQ29tbWVudDI3NDM3NTgxMA==,1217238,2017-01-23T01:09:49Z,2017-01-23T01:09:49Z,MEMBER,"Interesting -- thanks for sharing! I am interested in performance improvements but also a little reluctant to add in specialized optimizations directly into xarray.
You write that this is equivalent to `sum(a * w for a, w in zip(arrays, weights))`. How does this compare to stacking doing the sum in xarray, e.g.,`(arrays * weights).sum('stacked')`, where `arrays` and `weights` are now DataArray objects with a `'stacked'` dimension? Or maybe `arrays.dot(weights)`?
Using vectorized operations feels a bit more idiomatic (though also maybe more verbose). It also may be more performant. Note that the builtin `sum` is not optimized well by dask because it's basically equivalent to a loop:
```
def sum(xs):
result = 0
for x in xs:
result += x
return result
```
In contrast, `dask.array.sum` builds up a tree so it can do the sum in parallel.
There have also been discussion in https://github.com/pydata/xarray/issues/422 about adding a dedicated method for weighted mean.","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,202423683