html_url,issue_url,id,node_id,user,created_at,updated_at,author_association,body,reactions,performed_via_github_app,issue
https://github.com/pydata/xarray/issues/1115#issuecomment-549511089,https://api.github.com/repos/pydata/xarray/issues/1115,549511089,MDEyOklzc3VlQ29tbWVudDU0OTUxMTA4OQ==,6334793,2019-11-04T19:31:46Z,2019-11-04T19:31:46Z,NONE,"Guys sorry for dropping the ball on this one. I made some changes to the PR based on the feedback I got, but I couldn't figure out the tests. Would anyone like to take this over? ","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,188996339
https://github.com/pydata/xarray/pull/2652#issuecomment-452782113,https://api.github.com/repos/pydata/xarray/issues/2652,452782113,MDEyOklzc3VlQ29tbWVudDQ1Mjc4MjExMw==,6334793,2019-01-09T17:32:12Z,2019-01-09T17:32:12Z,NONE,"> I also think making this a function is probably a good idea, even though it's different from pandas.
> 
> One question: how should these functions align their arguments? Recall that xarray does an `inner` join for arithmetic (though there's an option to control this), and an `outer` join in most other cases. It's not entirely obvious to me what the right choice is here (or if it really even matters).

I always assumed an `inner` join is the way to go. I had initially just implemented `align`, but later changed to `broadcast` since the `align` doesn't add dimension/ labels (if missing in one of the inputs) to the output, but `broadcast` does. Without this, the `where(valid_values)` doesn't work if one input is 1-D and the other is N-D. ","{""total_count"": 1, ""+1"": 1, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,396102183
https://github.com/pydata/xarray/issues/1115#issuecomment-451602947,https://api.github.com/repos/pydata/xarray/issues/1115,451602947,MDEyOklzc3VlQ29tbWVudDQ1MTYwMjk0Nw==,6334793,2019-01-04T23:48:54Z,2019-01-04T23:48:54Z,NONE,"PR done!
Changed np.sum() to dataarray.sum()","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,188996339
https://github.com/pydata/xarray/pull/2652#issuecomment-451602256,https://api.github.com/repos/pydata/xarray/issues/2652,451602256,MDEyOklzc3VlQ29tbWVudDQ1MTYwMjI1Ng==,6334793,2019-01-04T23:44:10Z,2019-01-04T23:44:10Z,NONE,Made the code PEP8 compatible. Apologies for not doing so earlier. ,"{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,396102183
https://github.com/pydata/xarray/issues/1115#issuecomment-451052107,https://api.github.com/repos/pydata/xarray/issues/1115,451052107,MDEyOklzc3VlQ29tbWVudDQ1MTA1MjEwNw==,6334793,2019-01-03T04:10:35Z,2019-01-03T04:14:54Z,NONE,"Okay. Here's what I have come up with. I have tested it against two 1-d dataarrays, 2 N-D dataarrays, and one 1-D, and another N-D dataarrays, all cases having misaligned and having missing values. 

Before going forward, 
1. What do you think of it? Any improvements?
2. Steps 1 and 2 (broadcasting and ignoring common missing values) are identical in both cov() and corr(). Is there a better way to reduce the duplication while still retaining both functions as standalone?

```
def cov(self, other, dim = None):
    """"""Compute covariance between two DataArray objects along a shared dimension.

    Parameters
    ----------
    other: DataArray
        The other array with which the covariance will be computed
    dim: The dimension along which the covariance will be computed

    Returns
    -------
    covariance: DataArray
    """"""
    # 1. Broadcast the two arrays
    self, other     = xr.broadcast(self, other)
    
    # 2. Ignore the nans
    valid_values    = self.notnull() & other.notnull()
    self            = self.where(valid_values, drop=True)
    other           = other.where(valid_values, drop=True)
    valid_count     = valid_values.sum(dim)
    
    #3. Compute mean and standard deviation along the given dim
    demeaned_self   = self - self.mean(dim = dim)
    demeaned_other  = other - other.mean(dim = dim)
    
    #4. Compute  covariance along the given dim
    if dim:
        axis = self.get_axis_num(dim = dim)
    else:
        axis = None
    cov             =  np.sum(demeaned_self*demeaned_other, axis=axis)/(valid_count)
    
    return cov

def corr(self, other, dim = None):
    """"""Compute correlation between two DataArray objects along a shared dimension.

    Parameters
    ----------
    other: DataArray
        The other array with which the correlation will be computed
    dim: The dimension along which the correlation will be computed

    Returns
    -------
    correlation: DataArray
    """"""
    # 1. Broadcast the two arrays
    self, other     = xr.broadcast(self, other)
    
    # 2. Ignore the nans
    valid_values    = self.notnull() & other.notnull()
    self            = self.where(valid_values, drop=True)
    other           = other.where(valid_values, drop=True)
    
    # 3. Compute correlation based on standard deviations and cov()
    self_std        = self.std(dim=dim)
    other_std       = other.std(dim=dim)
    
    return cov(self, other, dim = dim)/(self_std*other_std)

```

For testing:
```
    # self: Load demo data and trim it's size
    ds  = xr.tutorial.load_dataset('air_temperature')
    air = ds.air[:18,...]
    # other: select missaligned data, and smooth it to dampen the correlation with self.
    air_smooth = ds.air[2:20,...].rolling(time= 3, center=True).mean(dim='time') #.
    # A handy function to select an example grid
    def select_pts(da):
        return da.sel(lat=45, lon=250)

    #Test #1: Misaligned 1-D dataarrays with missing values
    ts1 = select_pts(air.copy())
    ts2 = select_pts(air_smooth.copy())

    def pd_corr(ts1,ts2):
        """"""Ensure the ts are aligned and missing values ignored""""""
        # ts1,ts2 = xr.align(ts1,ts2)
        valid_values = ts1.notnull() & ts2.notnull()

        ts1  = ts1.where(valid_values, drop = True)
        ts2  = ts2.where(valid_values, drop = True)

        return ts1.to_series().corr(ts2.to_series())

    expected = pd_corr(ts1, ts2)
    actual   = corr(ts1,ts2)
    np.allclose(expected, actual)

    #Test #2: Misaligned N-D dataarrays with missing values
    actual_ND = corr(air,air_smooth, dim = 'time')
    actual = select_pts(actual_ND)
    np.allclose(expected, actual)

    # Test #3: One 1-D dataarray and another N-D dataarray; misaligned and having missing values
    actual_ND = corr(air_smooth,ts1, dim = 'time')
    actual    = select_pts(actual_ND)
    np.allclose(actual, expected)
```","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,188996339
https://github.com/pydata/xarray/issues/1115#issuecomment-445390271,https://api.github.com/repos/pydata/xarray/issues/1115,445390271,MDEyOklzc3VlQ29tbWVudDQ0NTM5MDI3MQ==,6334793,2018-12-07T22:53:06Z,2018-12-07T22:53:06Z,NONE,"Okay. I am writing the simultaneous correlation and covariance functions on dataxarray.py instead of dataset.py- following the pd.Series.corr(self, other, dim) style. ","{""total_count"": 1, ""+1"": 1, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,188996339
https://github.com/pydata/xarray/issues/1115#issuecomment-442994118,https://api.github.com/repos/pydata/xarray/issues/1115,442994118,MDEyOklzc3VlQ29tbWVudDQ0Mjk5NDExOA==,6334793,2018-11-29T21:09:55Z,2018-11-29T21:09:55Z,NONE,"Sorry for the radio silence- I will work on this next week. Thanks @max-sixty  for the updates,  @rabernat for reaching out, will let you know if I need help. 

Should we keep it simple following @max-sixty , or should I also add the functionality to handle lagged correlations? ","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,188996339
https://github.com/pydata/xarray/issues/1115#issuecomment-419501548,https://api.github.com/repos/pydata/xarray/issues/1115,419501548,MDEyOklzc3VlQ29tbWVudDQxOTUwMTU0OA==,6334793,2018-09-07T16:55:13Z,2018-09-07T16:55:13Z,NONE,"@max-sixty thanks!  

Then I will start with testing @shoyer 's suggestion and `mvstats` for the basic implementation. ","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,188996339
https://github.com/pydata/xarray/issues/1115#issuecomment-418406658,https://api.github.com/repos/pydata/xarray/issues/1115,418406658,MDEyOklzc3VlQ29tbWVudDQxODQwNjY1OA==,6334793,2018-09-04T15:15:35Z,2018-09-04T15:15:35Z,NONE,"Sometime back I wrote a [package](https://github.com/hrishikeshac/mvstats) based on xarray regarding this. I would be happy to be involved in implementing it in xarray as well, but I am new to contributing to such a large-scale project and it looks a bit intimidating!","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,188996339
https://github.com/pydata/xarray/issues/2009#issuecomment-375726695,https://api.github.com/repos/pydata/xarray/issues/2009,375726695,MDEyOklzc3VlQ29tbWVudDM3NTcyNjY5NQ==,6334793,2018-03-23T16:40:06Z,2018-03-23T16:43:45Z,NONE,"@mathause Thanks! Your solution worked brilliantly when used with contourf(). Here's how the code looks after implementing it. 

```python
import cartopy.crs as ccrs
import xarray as xr
import matplotlib.pyplot as plt
import matplotlib.gridspec as gridspec

projection = False

ts           = xr.tutorial.load_dataset('air_temperature').air[0, ...]
ncols, nrows = 2, 2
fig          = plt.figure()
# ny,nx        = ts.shape
# dx, dy       = nx/ny, 1
# figsize      = plt.figaspect(float(dy * ncols) / float(dx * nrows))
# fig          = plt.figure(figsize=figsize)
gs           = gridspec.GridSpec(ncols, nrows,wspace=0, hspace=0)

def set_map_layout(axes, width=17.0):
    """"""
    set figure height, given width

    Needs to be called after all plotting is done.
    Source: @mathause https://github.com/mathause/mplotutils/blob/v0.1.0/mplotutils/mpl_utils.py#L47-L100
    Parameters
    ----------
    axes : ndarray of (Geo)Axes
        Array with all axes of the figure.
    width : float
        Width of the full figure in cm. Default 17

    ..note: currently only works if all the axes have the same aspect
    ratio.
    """"""

    if isinstance(axes, plt.Axes):
        ax = axes
    else:
        # assumes the first of the axes is representative for all
        ax = axes.flat[0]

    # read figure data
    f = ax.get_figure()

    bottom = f.subplotpars.bottom
    top = f.subplotpars.top
    left = f.subplotpars.left
    right = f.subplotpars.right
    hspace = f.subplotpars.hspace
    wspace = f.subplotpars.wspace

    # data ratio is the aspect
    aspect = ax.get_data_ratio()
    # get geometry tells how many subplots there are
    nrow, ncol, __ = ax.get_geometry()

    # width of one plot, taking into account
    # left * wf, (1-right) * wf, ncol * wp, (1-ncol) * wp * wspace
    wp = (width - width * (left + (1-right))) / (ncol + (ncol-1) * wspace)

    # height of one plot
    hp = wp * aspect

    # height of figure
    height = (hp * (nrow + ((nrow - 1) * hspace))) / (1. - (bottom + (1 - top)))

    f.set_figwidth(width / 2.54)
    f.set_figheight(height / 2.54)
    
for i in range(4):
    if projection:
        ax = plt.subplot(gs[i], projection=ccrs.PlateCarree())
        ax.coastlines()
        ts.plot.contourf(ax=ax, add_colorbar=False, add_labels=False, levels=11,
                transform=ccrs.PlateCarree())
    else:
        ax = plt.subplot(gs[i])
        ts.plot.contourf(ax=ax, add_colorbar=False, levels=11, add_labels=False)

    ax.set_aspect('auto', adjustable='box-forced')
    if (i == 0) or (i == 1):
        ax.set_title('title')
    if (i == 0) or (i == 2):
        ax.set_ylabel('ylabel')
    ax.set_xticks([])
    ax.set_yticks([])
    set_map_layout(ax)# plt.tight_layout()
# fig.subplots_adjust()
```
without projection:
![demo_without_projection](https://user-images.githubusercontent.com/6334793/37841835-cf2dfa86-2e7d-11e8-9584-3bf19c21f570.png)

With projection:
![demo_with_projection](https://user-images.githubusercontent.com/6334793/37841855-d8eaf24a-2e7d-11e8-84b4-ba323f45232e.png)
","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,307903558
https://github.com/pydata/xarray/issues/1115#issuecomment-331686038,https://api.github.com/repos/pydata/xarray/issues/1115,331686038,MDEyOklzc3VlQ29tbWVudDMzMTY4NjAzOA==,6334793,2017-09-24T04:14:00Z,2017-09-24T04:14:00Z,NONE,"FYI @shoyer @fmaussion , I had to revisit the problem and ended up writing a function to compute vectorized cross-correlation, covariance, regression calculations (along with p-value and standard error) for xr.DataArrays. Essentially, I tried to mimic scipy.stats.linregress() but for multi-dimensional data, and included the ability to compute lagged relationships. Here's the function and its demonstration; please feel free to incorporate it in xarray if deemed useful: https://hrishichandanpurkar.blogspot.com/2017/09/vectorized-functions-for-correlation.html
","{""total_count"": 5, ""+1"": 4, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 1, ""rocket"": 0, ""eyes"": 0}",,188996339