html_url,issue_url,id,node_id,user,created_at,updated_at,author_association,body,reactions,performed_via_github_app,issue https://github.com/pydata/xarray/issues/422#issuecomment-218413377,https://api.github.com/repos/pydata/xarray/issues/422,218413377,MDEyOklzc3VlQ29tbWVudDIxODQxMzM3Nw==,10194086,2016-05-11T09:51:29Z,2016-05-11T09:51:29Z,MEMBER,"Do we want ``` da.weighted(weight, dim='time').mean() ``` or ``` da.weighted(weight).mean(dim='time') ``` ","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,84127296 https://github.com/pydata/xarray/issues/422#issuecomment-218403213,https://api.github.com/repos/pydata/xarray/issues/422,218403213,MDEyOklzc3VlQ29tbWVudDIxODQwMzIxMw==,10194086,2016-05-11T09:06:49Z,2016-05-11T09:07:24Z,MEMBER,"Sounds like a clean solution. Then we can defer handling of NaN in the weights to `weighted` (e.g. by a `skipna_weights` argument in `weighted`). Also returning `sum_of_weights` can be a method of the class. We may still end up implementing all required methods separately in `weighted`. For mean we do: ``` (data * weights / sum_of_weights).sum(dim=dim) ``` i.e. we use `sum` and not `mean`. We could rewrite this to: ``` (data * weights / sum_of_weights).mean(dim=dim) * weights.count(dim=dim) ``` However, I think this can not be generalized to a `reduce` function. See e.g. for `std` http://stackoverflow.com/questions/30383270/how-do-i-calculate-the-standard-deviation-between-weighted-measurements Additionally, `weighted` does not make sense for many operations (I would say) e.g.: `min`, `max`, `count`, ... ","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,84127296 https://github.com/pydata/xarray/issues/422#issuecomment-140823232,https://api.github.com/repos/pydata/xarray/issues/422,140823232,MDEyOklzc3VlQ29tbWVudDE0MDgyMzIzMg==,10194086,2015-09-16T18:02:39Z,2015-09-16T18:02:39Z,MEMBER,"Thanks - that seems to be the fastest possibility. I wrote the functions for Dataset and DataArray ``` python def average_da(self, dim=None, weights=None): """""" weighted average for DataArrays Parameters ---------- dim : str or sequence of str, optional Dimension(s) over which to apply average. weights : DataArray weights to apply. Shape must be broadcastable to shape of self. Returns ------- reduced : DataArray New DataArray with average applied to its data and the indicated dimension(s) removed. """""" if weights is None: return self.mean(dim) else: if not isinstance(weights, xray.DataArray): raise ValueError(""weights must be a DataArray"") # if NaNs are present, we need individual weights if self.notnull().any(): total_weights = weights.where(self.notnull()).sum(dim=dim) else: total_weights = weights.sum(dim) return (self * weights).sum(dim) / total_weights # ----------------------------------------------------------------------------- def average_ds(self, dim=None, weights=None): """""" weighted average for Datasets Parameters ---------- dim : str or sequence of str, optional Dimension(s) over which to apply average. weights : DataArray weights to apply. Shape must be broadcastable to shape of data. Returns ------- reduced : Dataset New Dataset with average applied to its data and the indicated dimension(s) removed. """""" if weights is None: return self.mean(dim) else: return self.apply(average_da, dim=dim, weights=weights) ``` They can be combined to one function: ``` python def average(data, dim=None, weights=None): """""" weighted average for xray objects Parameters ---------- data : Dataset or DataArray the xray object to average over dim : str or sequence of str, optional Dimension(s) over which to apply average. weights : DataArray weights to apply. Shape must be broadcastable to shape of data. Returns ------- reduced : Dataset or DataArray New xray object with average applied to its data and the indicated dimension(s) removed. """""" if isinstance(data, xray.Dataset): return average_ds(data, dim, weights) elif isinstance(data, xray.DataArray): return average_da(data, dim, weights) else: raise ValueError(""date must be an xray Dataset or DataArray"") ``` Or a monkey patch: ``` python xray.DataArray.average = average_da xray.Dataset.average = average_ds ``` ","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,84127296 https://github.com/pydata/xarray/issues/422#issuecomment-140794893,https://api.github.com/repos/pydata/xarray/issues/422,140794893,MDEyOklzc3VlQ29tbWVudDE0MDc5NDg5Mw==,10194086,2015-09-16T16:29:22Z,2015-09-16T16:29:32Z,MEMBER,"This is has to be adjusted if there are `NaN` in the array. `weights.sum(dim)` needs to be corrected not to count weights on indices where there is a `NaN` in `self`. Is there a better way to get the correct weights than: ``` total_weights = weights.sum(dim) * self / self ``` It should probably not be used on a Dataset as every DataArray may have its own `NaN` structure. Or the equivalent Dataset method should loop through the DataArrays. ","{""total_count"": 1, ""+1"": 1, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,84127296