html_url,issue_url,id,node_id,user,created_at,updated_at,author_association,body,reactions,performed_via_github_app,issue
https://github.com/pydata/xarray/issues/422#issuecomment-218413377,https://api.github.com/repos/pydata/xarray/issues/422,218413377,MDEyOklzc3VlQ29tbWVudDIxODQxMzM3Nw==,10194086,2016-05-11T09:51:29Z,2016-05-11T09:51:29Z,MEMBER,"Do we want
```
da.weighted(weight, dim='time').mean()
```
or
```
da.weighted(weight).mean(dim='time')
```
","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,84127296
https://github.com/pydata/xarray/issues/422#issuecomment-218403213,https://api.github.com/repos/pydata/xarray/issues/422,218403213,MDEyOklzc3VlQ29tbWVudDIxODQwMzIxMw==,10194086,2016-05-11T09:06:49Z,2016-05-11T09:07:24Z,MEMBER,"Sounds like a clean solution. Then we can defer handling of NaN in the weights to `weighted` (e.g. by a `skipna_weights` argument in `weighted`). Also returning `sum_of_weights` can be a method of the class.
We may still end up implementing all required methods separately in `weighted`. For mean we do:
```
(data * weights / sum_of_weights).sum(dim=dim)
```
i.e. we use `sum` and not `mean`. We could rewrite this to:
```
(data * weights / sum_of_weights).mean(dim=dim) * weights.count(dim=dim)
```
However, I think this can not be generalized to a `reduce` function. See e.g. for `std` http://stackoverflow.com/questions/30383270/how-do-i-calculate-the-standard-deviation-between-weighted-measurements
Additionally, `weighted` does not make sense for many operations (I would say) e.g.: `min`, `max`, `count`, ...
","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,84127296
https://github.com/pydata/xarray/issues/422#issuecomment-140823232,https://api.github.com/repos/pydata/xarray/issues/422,140823232,MDEyOklzc3VlQ29tbWVudDE0MDgyMzIzMg==,10194086,2015-09-16T18:02:39Z,2015-09-16T18:02:39Z,MEMBER,"Thanks - that seems to be the fastest possibility. I wrote the functions for Dataset and DataArray
``` python
def average_da(self, dim=None, weights=None):
""""""
weighted average for DataArrays
Parameters
----------
dim : str or sequence of str, optional
Dimension(s) over which to apply average.
weights : DataArray
weights to apply. Shape must be broadcastable to shape of self.
Returns
-------
reduced : DataArray
New DataArray with average applied to its data and the indicated
dimension(s) removed.
""""""
if weights is None:
return self.mean(dim)
else:
if not isinstance(weights, xray.DataArray):
raise ValueError(""weights must be a DataArray"")
# if NaNs are present, we need individual weights
if self.notnull().any():
total_weights = weights.where(self.notnull()).sum(dim=dim)
else:
total_weights = weights.sum(dim)
return (self * weights).sum(dim) / total_weights
# -----------------------------------------------------------------------------
def average_ds(self, dim=None, weights=None):
""""""
weighted average for Datasets
Parameters
----------
dim : str or sequence of str, optional
Dimension(s) over which to apply average.
weights : DataArray
weights to apply. Shape must be broadcastable to shape of data.
Returns
-------
reduced : Dataset
New Dataset with average applied to its data and the indicated
dimension(s) removed.
""""""
if weights is None:
return self.mean(dim)
else:
return self.apply(average_da, dim=dim, weights=weights)
```
They can be combined to one function:
``` python
def average(data, dim=None, weights=None):
""""""
weighted average for xray objects
Parameters
----------
data : Dataset or DataArray
the xray object to average over
dim : str or sequence of str, optional
Dimension(s) over which to apply average.
weights : DataArray
weights to apply. Shape must be broadcastable to shape of data.
Returns
-------
reduced : Dataset or DataArray
New xray object with average applied to its data and the indicated
dimension(s) removed.
""""""
if isinstance(data, xray.Dataset):
return average_ds(data, dim, weights)
elif isinstance(data, xray.DataArray):
return average_da(data, dim, weights)
else:
raise ValueError(""date must be an xray Dataset or DataArray"")
```
Or a monkey patch:
``` python
xray.DataArray.average = average_da
xray.Dataset.average = average_ds
```
","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,84127296
https://github.com/pydata/xarray/issues/422#issuecomment-140794893,https://api.github.com/repos/pydata/xarray/issues/422,140794893,MDEyOklzc3VlQ29tbWVudDE0MDc5NDg5Mw==,10194086,2015-09-16T16:29:22Z,2015-09-16T16:29:32Z,MEMBER,"This is has to be adjusted if there are `NaN` in the array. `weights.sum(dim)` needs to be corrected not to count weights on indices where there is a `NaN` in `self`.
Is there a better way to get the correct weights than:
```
total_weights = weights.sum(dim) * self / self
```
It should probably not be used on a Dataset as every DataArray may have its own `NaN` structure. Or the equivalent Dataset method should loop through the DataArrays.
","{""total_count"": 1, ""+1"": 1, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,84127296