html_url,issue_url,id,node_id,user,created_at,updated_at,author_association,body,reactions,performed_via_github_app,issue https://github.com/pydata/xarray/issues/1115#issuecomment-549511089,https://api.github.com/repos/pydata/xarray/issues/1115,549511089,MDEyOklzc3VlQ29tbWVudDU0OTUxMTA4OQ==,6334793,2019-11-04T19:31:46Z,2019-11-04T19:31:46Z,NONE,"Guys sorry for dropping the ball on this one. I made some changes to the PR based on the feedback I got, but I couldn't figure out the tests. Would anyone like to take this over? ","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,188996339 https://github.com/pydata/xarray/issues/1115#issuecomment-451602947,https://api.github.com/repos/pydata/xarray/issues/1115,451602947,MDEyOklzc3VlQ29tbWVudDQ1MTYwMjk0Nw==,6334793,2019-01-04T23:48:54Z,2019-01-04T23:48:54Z,NONE,"PR done! Changed np.sum() to dataarray.sum()","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,188996339 https://github.com/pydata/xarray/issues/1115#issuecomment-451052107,https://api.github.com/repos/pydata/xarray/issues/1115,451052107,MDEyOklzc3VlQ29tbWVudDQ1MTA1MjEwNw==,6334793,2019-01-03T04:10:35Z,2019-01-03T04:14:54Z,NONE,"Okay. Here's what I have come up with. I have tested it against two 1-d dataarrays, 2 N-D dataarrays, and one 1-D, and another N-D dataarrays, all cases having misaligned and having missing values. Before going forward, 1. What do you think of it? Any improvements? 2. Steps 1 and 2 (broadcasting and ignoring common missing values) are identical in both cov() and corr(). Is there a better way to reduce the duplication while still retaining both functions as standalone? ``` def cov(self, other, dim = None): """"""Compute covariance between two DataArray objects along a shared dimension. Parameters ---------- other: DataArray The other array with which the covariance will be computed dim: The dimension along which the covariance will be computed Returns ------- covariance: DataArray """""" # 1. Broadcast the two arrays self, other = xr.broadcast(self, other) # 2. Ignore the nans valid_values = self.notnull() & other.notnull() self = self.where(valid_values, drop=True) other = other.where(valid_values, drop=True) valid_count = valid_values.sum(dim) #3. Compute mean and standard deviation along the given dim demeaned_self = self - self.mean(dim = dim) demeaned_other = other - other.mean(dim = dim) #4. Compute covariance along the given dim if dim: axis = self.get_axis_num(dim = dim) else: axis = None cov = np.sum(demeaned_self*demeaned_other, axis=axis)/(valid_count) return cov def corr(self, other, dim = None): """"""Compute correlation between two DataArray objects along a shared dimension. Parameters ---------- other: DataArray The other array with which the correlation will be computed dim: The dimension along which the correlation will be computed Returns ------- correlation: DataArray """""" # 1. Broadcast the two arrays self, other = xr.broadcast(self, other) # 2. Ignore the nans valid_values = self.notnull() & other.notnull() self = self.where(valid_values, drop=True) other = other.where(valid_values, drop=True) # 3. Compute correlation based on standard deviations and cov() self_std = self.std(dim=dim) other_std = other.std(dim=dim) return cov(self, other, dim = dim)/(self_std*other_std) ``` For testing: ``` # self: Load demo data and trim it's size ds = xr.tutorial.load_dataset('air_temperature') air = ds.air[:18,...] # other: select missaligned data, and smooth it to dampen the correlation with self. air_smooth = ds.air[2:20,...].rolling(time= 3, center=True).mean(dim='time') #. # A handy function to select an example grid def select_pts(da): return da.sel(lat=45, lon=250) #Test #1: Misaligned 1-D dataarrays with missing values ts1 = select_pts(air.copy()) ts2 = select_pts(air_smooth.copy()) def pd_corr(ts1,ts2): """"""Ensure the ts are aligned and missing values ignored"""""" # ts1,ts2 = xr.align(ts1,ts2) valid_values = ts1.notnull() & ts2.notnull() ts1 = ts1.where(valid_values, drop = True) ts2 = ts2.where(valid_values, drop = True) return ts1.to_series().corr(ts2.to_series()) expected = pd_corr(ts1, ts2) actual = corr(ts1,ts2) np.allclose(expected, actual) #Test #2: Misaligned N-D dataarrays with missing values actual_ND = corr(air,air_smooth, dim = 'time') actual = select_pts(actual_ND) np.allclose(expected, actual) # Test #3: One 1-D dataarray and another N-D dataarray; misaligned and having missing values actual_ND = corr(air_smooth,ts1, dim = 'time') actual = select_pts(actual_ND) np.allclose(actual, expected) ```","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,188996339 https://github.com/pydata/xarray/issues/1115#issuecomment-445390271,https://api.github.com/repos/pydata/xarray/issues/1115,445390271,MDEyOklzc3VlQ29tbWVudDQ0NTM5MDI3MQ==,6334793,2018-12-07T22:53:06Z,2018-12-07T22:53:06Z,NONE,"Okay. I am writing the simultaneous correlation and covariance functions on dataxarray.py instead of dataset.py- following the pd.Series.corr(self, other, dim) style. ","{""total_count"": 1, ""+1"": 1, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,188996339 https://github.com/pydata/xarray/issues/1115#issuecomment-442994118,https://api.github.com/repos/pydata/xarray/issues/1115,442994118,MDEyOklzc3VlQ29tbWVudDQ0Mjk5NDExOA==,6334793,2018-11-29T21:09:55Z,2018-11-29T21:09:55Z,NONE,"Sorry for the radio silence- I will work on this next week. Thanks @max-sixty for the updates, @rabernat for reaching out, will let you know if I need help. Should we keep it simple following @max-sixty , or should I also add the functionality to handle lagged correlations? ","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,188996339 https://github.com/pydata/xarray/issues/1115#issuecomment-419501548,https://api.github.com/repos/pydata/xarray/issues/1115,419501548,MDEyOklzc3VlQ29tbWVudDQxOTUwMTU0OA==,6334793,2018-09-07T16:55:13Z,2018-09-07T16:55:13Z,NONE,"@max-sixty thanks! Then I will start with testing @shoyer 's suggestion and `mvstats` for the basic implementation. ","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,188996339 https://github.com/pydata/xarray/issues/1115#issuecomment-418406658,https://api.github.com/repos/pydata/xarray/issues/1115,418406658,MDEyOklzc3VlQ29tbWVudDQxODQwNjY1OA==,6334793,2018-09-04T15:15:35Z,2018-09-04T15:15:35Z,NONE,"Sometime back I wrote a [package](https://github.com/hrishikeshac/mvstats) based on xarray regarding this. I would be happy to be involved in implementing it in xarray as well, but I am new to contributing to such a large-scale project and it looks a bit intimidating!","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,188996339 https://github.com/pydata/xarray/issues/1115#issuecomment-331686038,https://api.github.com/repos/pydata/xarray/issues/1115,331686038,MDEyOklzc3VlQ29tbWVudDMzMTY4NjAzOA==,6334793,2017-09-24T04:14:00Z,2017-09-24T04:14:00Z,NONE,"FYI @shoyer @fmaussion , I had to revisit the problem and ended up writing a function to compute vectorized cross-correlation, covariance, regression calculations (along with p-value and standard error) for xr.DataArrays. Essentially, I tried to mimic scipy.stats.linregress() but for multi-dimensional data, and included the ability to compute lagged relationships. Here's the function and its demonstration; please feel free to incorporate it in xarray if deemed useful: https://hrishichandanpurkar.blogspot.com/2017/09/vectorized-functions-for-correlation.html ","{""total_count"": 5, ""+1"": 4, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 1, ""rocket"": 0, ""eyes"": 0}",,188996339