html_url,issue_url,id,node_id,user,created_at,updated_at,author_association,body,reactions,performed_via_github_app,issue
https://github.com/pydata/xarray/issues/1115#issuecomment-555726775,https://api.github.com/repos/pydata/xarray/issues/1115,555726775,MDEyOklzc3VlQ29tbWVudDU1NTcyNjc3NQ==,45787861,2019-11-19T21:36:42Z,2019-11-19T21:36:42Z,NONE,">
>
> @r-beer would be great to finish this off! I think this would be a popular feature. You could take @hrishikeshac 's code (which is close!) and make the final changes.
OK, that means to make #2652 pass, right?
I downloaded the respective branch from @hrishikeshac, and ran the tests locally.
See respective discussion in #2652.
","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,188996339
https://github.com/pydata/xarray/issues/1115#issuecomment-555376229,https://api.github.com/repos/pydata/xarray/issues/1115,555376229,MDEyOklzc3VlQ29tbWVudDU1NTM3NjIyOQ==,45787861,2019-11-19T07:44:23Z,2019-11-19T07:45:26Z,NONE,"I am also highly interested in this function and in contributing to xarray in general!
If I understand correctly, https://github.com/pydata/xarray/pull/2350 and https://github.com/pydata/xarray/pull/2652 do not solve this PR, do they?
How can I help you finishing these PRs?","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,188996339
https://github.com/pydata/xarray/issues/1115#issuecomment-549511089,https://api.github.com/repos/pydata/xarray/issues/1115,549511089,MDEyOklzc3VlQ29tbWVudDU0OTUxMTA4OQ==,6334793,2019-11-04T19:31:46Z,2019-11-04T19:31:46Z,NONE,"Guys sorry for dropping the ball on this one. I made some changes to the PR based on the feedback I got, but I couldn't figure out the tests. Would anyone like to take this over? ","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,188996339
https://github.com/pydata/xarray/issues/1115#issuecomment-545986180,https://api.github.com/repos/pydata/xarray/issues/1115,545986180,MDEyOklzc3VlQ29tbWVudDU0NTk4NjE4MA==,2497349,2019-10-24T15:59:35Z,2019-10-24T15:59:35Z,NONE,I see that this PR never made it through and there is a somewhat similar PR finished here: https://github.com/pydata/xarray/pull/2350 though it doesn't do exactly what was proposed in this PR. Is there a suggested approach for performing cross-correlation on multiple DataArray?,"{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,188996339
https://github.com/pydata/xarray/issues/1115#issuecomment-451602947,https://api.github.com/repos/pydata/xarray/issues/1115,451602947,MDEyOklzc3VlQ29tbWVudDQ1MTYwMjk0Nw==,6334793,2019-01-04T23:48:54Z,2019-01-04T23:48:54Z,NONE,"PR done!
Changed np.sum() to dataarray.sum()","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,188996339
https://github.com/pydata/xarray/issues/1115#issuecomment-451052107,https://api.github.com/repos/pydata/xarray/issues/1115,451052107,MDEyOklzc3VlQ29tbWVudDQ1MTA1MjEwNw==,6334793,2019-01-03T04:10:35Z,2019-01-03T04:14:54Z,NONE,"Okay. Here's what I have come up with. I have tested it against two 1-d dataarrays, 2 N-D dataarrays, and one 1-D, and another N-D dataarrays, all cases having misaligned and having missing values.
Before going forward,
1. What do you think of it? Any improvements?
2. Steps 1 and 2 (broadcasting and ignoring common missing values) are identical in both cov() and corr(). Is there a better way to reduce the duplication while still retaining both functions as standalone?
```
def cov(self, other, dim = None):
""""""Compute covariance between two DataArray objects along a shared dimension.
Parameters
----------
other: DataArray
The other array with which the covariance will be computed
dim: The dimension along which the covariance will be computed
Returns
-------
covariance: DataArray
""""""
# 1. Broadcast the two arrays
self, other = xr.broadcast(self, other)
# 2. Ignore the nans
valid_values = self.notnull() & other.notnull()
self = self.where(valid_values, drop=True)
other = other.where(valid_values, drop=True)
valid_count = valid_values.sum(dim)
#3. Compute mean and standard deviation along the given dim
demeaned_self = self - self.mean(dim = dim)
demeaned_other = other - other.mean(dim = dim)
#4. Compute covariance along the given dim
if dim:
axis = self.get_axis_num(dim = dim)
else:
axis = None
cov = np.sum(demeaned_self*demeaned_other, axis=axis)/(valid_count)
return cov
def corr(self, other, dim = None):
""""""Compute correlation between two DataArray objects along a shared dimension.
Parameters
----------
other: DataArray
The other array with which the correlation will be computed
dim: The dimension along which the correlation will be computed
Returns
-------
correlation: DataArray
""""""
# 1. Broadcast the two arrays
self, other = xr.broadcast(self, other)
# 2. Ignore the nans
valid_values = self.notnull() & other.notnull()
self = self.where(valid_values, drop=True)
other = other.where(valid_values, drop=True)
# 3. Compute correlation based on standard deviations and cov()
self_std = self.std(dim=dim)
other_std = other.std(dim=dim)
return cov(self, other, dim = dim)/(self_std*other_std)
```
For testing:
```
# self: Load demo data and trim it's size
ds = xr.tutorial.load_dataset('air_temperature')
air = ds.air[:18,...]
# other: select missaligned data, and smooth it to dampen the correlation with self.
air_smooth = ds.air[2:20,...].rolling(time= 3, center=True).mean(dim='time') #.
# A handy function to select an example grid
def select_pts(da):
return da.sel(lat=45, lon=250)
#Test #1: Misaligned 1-D dataarrays with missing values
ts1 = select_pts(air.copy())
ts2 = select_pts(air_smooth.copy())
def pd_corr(ts1,ts2):
""""""Ensure the ts are aligned and missing values ignored""""""
# ts1,ts2 = xr.align(ts1,ts2)
valid_values = ts1.notnull() & ts2.notnull()
ts1 = ts1.where(valid_values, drop = True)
ts2 = ts2.where(valid_values, drop = True)
return ts1.to_series().corr(ts2.to_series())
expected = pd_corr(ts1, ts2)
actual = corr(ts1,ts2)
np.allclose(expected, actual)
#Test #2: Misaligned N-D dataarrays with missing values
actual_ND = corr(air,air_smooth, dim = 'time')
actual = select_pts(actual_ND)
np.allclose(expected, actual)
# Test #3: One 1-D dataarray and another N-D dataarray; misaligned and having missing values
actual_ND = corr(air_smooth,ts1, dim = 'time')
actual = select_pts(actual_ND)
np.allclose(actual, expected)
```","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,188996339
https://github.com/pydata/xarray/issues/1115#issuecomment-445390271,https://api.github.com/repos/pydata/xarray/issues/1115,445390271,MDEyOklzc3VlQ29tbWVudDQ0NTM5MDI3MQ==,6334793,2018-12-07T22:53:06Z,2018-12-07T22:53:06Z,NONE,"Okay. I am writing the simultaneous correlation and covariance functions on dataxarray.py instead of dataset.py- following the pd.Series.corr(self, other, dim) style. ","{""total_count"": 1, ""+1"": 1, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,188996339
https://github.com/pydata/xarray/issues/1115#issuecomment-442994118,https://api.github.com/repos/pydata/xarray/issues/1115,442994118,MDEyOklzc3VlQ29tbWVudDQ0Mjk5NDExOA==,6334793,2018-11-29T21:09:55Z,2018-11-29T21:09:55Z,NONE,"Sorry for the radio silence- I will work on this next week. Thanks @max-sixty for the updates, @rabernat for reaching out, will let you know if I need help.
Should we keep it simple following @max-sixty , or should I also add the functionality to handle lagged correlations? ","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,188996339
https://github.com/pydata/xarray/issues/1115#issuecomment-419501548,https://api.github.com/repos/pydata/xarray/issues/1115,419501548,MDEyOklzc3VlQ29tbWVudDQxOTUwMTU0OA==,6334793,2018-09-07T16:55:13Z,2018-09-07T16:55:13Z,NONE,"@max-sixty thanks!
Then I will start with testing @shoyer 's suggestion and `mvstats` for the basic implementation. ","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,188996339
https://github.com/pydata/xarray/issues/1115#issuecomment-418406658,https://api.github.com/repos/pydata/xarray/issues/1115,418406658,MDEyOklzc3VlQ29tbWVudDQxODQwNjY1OA==,6334793,2018-09-04T15:15:35Z,2018-09-04T15:15:35Z,NONE,"Sometime back I wrote a [package](https://github.com/hrishikeshac/mvstats) based on xarray regarding this. I would be happy to be involved in implementing it in xarray as well, but I am new to contributing to such a large-scale project and it looks a bit intimidating!","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,188996339
https://github.com/pydata/xarray/issues/1115#issuecomment-349670336,https://api.github.com/repos/pydata/xarray/issues/1115,349670336,MDEyOklzc3VlQ29tbWVudDM0OTY3MDMzNg==,5929935,2017-12-06T15:17:40Z,2017-12-06T15:17:40Z,NONE,"@hrishikeshac I was just looking for a function doing a regression between two datasets (x, y, time), so thanks for your function! However, I'm still wondering whether there is a much faster C (or Cython) implementation doing these kind of things?","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,188996339
https://github.com/pydata/xarray/issues/1115#issuecomment-331686038,https://api.github.com/repos/pydata/xarray/issues/1115,331686038,MDEyOklzc3VlQ29tbWVudDMzMTY4NjAzOA==,6334793,2017-09-24T04:14:00Z,2017-09-24T04:14:00Z,NONE,"FYI @shoyer @fmaussion , I had to revisit the problem and ended up writing a function to compute vectorized cross-correlation, covariance, regression calculations (along with p-value and standard error) for xr.DataArrays. Essentially, I tried to mimic scipy.stats.linregress() but for multi-dimensional data, and included the ability to compute lagged relationships. Here's the function and its demonstration; please feel free to incorporate it in xarray if deemed useful: https://hrishichandanpurkar.blogspot.com/2017/09/vectorized-functions-for-correlation.html
","{""total_count"": 5, ""+1"": 4, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 1, ""rocket"": 0, ""eyes"": 0}",,188996339
https://github.com/pydata/xarray/issues/1115#issuecomment-260379241,https://api.github.com/repos/pydata/xarray/issues/1115,260379241,MDEyOklzc3VlQ29tbWVudDI2MDM3OTI0MQ==,19403647,2016-11-14T16:10:55Z,2016-11-14T16:10:55Z,NONE,"I agree with @rabernat in the sense that it could be part of another package (e.g., signal processing). This would also allow the computation of statistical test to assess the significance of the correlation (which is useful since correlation may often be misinterpreted without statistical tests).
","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,188996339