html_url,issue_url,id,node_id,user,created_at,updated_at,author_association,body,reactions,performed_via_github_app,issue
https://github.com/pydata/xarray/issues/422#issuecomment-218413377,https://api.github.com/repos/pydata/xarray/issues/422,218413377,MDEyOklzc3VlQ29tbWVudDIxODQxMzM3Nw==,10194086,2016-05-11T09:51:29Z,2016-05-11T09:51:29Z,MEMBER,"Do we want

```
da.weighted(weight, dim='time').mean()
```

or

```
da.weighted(weight).mean(dim='time')
```
","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,84127296
https://github.com/pydata/xarray/issues/422#issuecomment-218403213,https://api.github.com/repos/pydata/xarray/issues/422,218403213,MDEyOklzc3VlQ29tbWVudDIxODQwMzIxMw==,10194086,2016-05-11T09:06:49Z,2016-05-11T09:07:24Z,MEMBER,"Sounds like a clean solution. Then we can defer handling of NaN in the weights to `weighted` (e.g. by a `skipna_weights` argument in `weighted`). Also returning `sum_of_weights` can be a method of the class.

We may still end up implementing all required methods separately in `weighted`. For mean we do:

```
(data * weights / sum_of_weights).sum(dim=dim)
```

i.e. we use `sum` and not `mean`. We could rewrite this to:

```
(data * weights / sum_of_weights).mean(dim=dim) * weights.count(dim=dim)
```

However, I think this can not be generalized to a `reduce` function. See e.g. for `std` http://stackoverflow.com/questions/30383270/how-do-i-calculate-the-standard-deviation-between-weighted-measurements

Additionally, `weighted` does not make sense for many operations (I would say) e.g.: `min`, `max`, `count`, ...
","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,84127296
https://github.com/pydata/xarray/issues/422#issuecomment-140823232,https://api.github.com/repos/pydata/xarray/issues/422,140823232,MDEyOklzc3VlQ29tbWVudDE0MDgyMzIzMg==,10194086,2015-09-16T18:02:39Z,2015-09-16T18:02:39Z,MEMBER,"Thanks - that seems to be the fastest possibility. I wrote the functions for Dataset and DataArray

``` python
def average_da(self, dim=None, weights=None):
    """"""
    weighted average for DataArrays

    Parameters
    ----------
    dim : str or sequence of str, optional
        Dimension(s) over which to apply average.
    weights : DataArray
        weights to apply. Shape must be broadcastable to shape of self.

    Returns
    -------
    reduced : DataArray
        New DataArray with average applied to its data and the indicated
        dimension(s) removed.

    """"""

    if weights is None:
        return self.mean(dim)
    else:
        if not isinstance(weights, xray.DataArray):
            raise ValueError(""weights must be a DataArray"")

        # if NaNs are present, we need individual weights
        if self.notnull().any():
            total_weights = weights.where(self.notnull()).sum(dim=dim)
        else:
            total_weights = weights.sum(dim)

        return (self * weights).sum(dim) / total_weights

# -----------------------------------------------------------------------------

def average_ds(self, dim=None, weights=None):
    """"""
    weighted average for Datasets

    Parameters
    ----------
    dim : str or sequence of str, optional
        Dimension(s) over which to apply average.
    weights : DataArray
        weights to apply. Shape must be broadcastable to shape of data.

    Returns
    -------
    reduced : Dataset
        New Dataset with average applied to its data and the indicated
        dimension(s) removed.

    """"""

    if weights is None:
        return self.mean(dim)
    else:
        return self.apply(average_da, dim=dim, weights=weights)
```

They can be combined to one function:

``` python
def average(data, dim=None, weights=None):
    """"""
    weighted average for xray objects

    Parameters
    ----------
    data : Dataset or DataArray
        the xray object to average over
    dim : str or sequence of str, optional
        Dimension(s) over which to apply average.
    weights : DataArray
        weights to apply. Shape must be broadcastable to shape of data.

    Returns
    -------
    reduced : Dataset or DataArray
        New xray object with average applied to its data and the indicated
        dimension(s) removed.

    """"""

    if isinstance(data, xray.Dataset):
        return average_ds(data, dim, weights)
    elif isinstance(data, xray.DataArray):
        return average_da(data, dim, weights)
    else:
        raise ValueError(""date must be an xray Dataset or DataArray"")
```

Or a monkey patch:

``` python
xray.DataArray.average = average_da
xray.Dataset.average = average_ds
```
","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,84127296
https://github.com/pydata/xarray/issues/422#issuecomment-140794893,https://api.github.com/repos/pydata/xarray/issues/422,140794893,MDEyOklzc3VlQ29tbWVudDE0MDc5NDg5Mw==,10194086,2015-09-16T16:29:22Z,2015-09-16T16:29:32Z,MEMBER,"This is has to be adjusted if there are `NaN` in the array. `weights.sum(dim)` needs to be corrected not to count weights on indices where there is a `NaN` in `self`. 

Is there a better way to get the correct weights than:

```
total_weights = weights.sum(dim) * self / self
```

It should probably not be used on a Dataset as every DataArray may have its own `NaN` structure. Or the equivalent Dataset method should loop through the DataArrays.
","{""total_count"": 1, ""+1"": 1, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,84127296