html_url,issue_url,id,node_id,user,created_at,updated_at,author_association,body,reactions,performed_via_github_app,issue
https://github.com/pydata/xarray/pull/4089#issuecomment-634165154,https://api.github.com/repos/pydata/xarray/issues/4089,634165154,MDEyOklzc3VlQ29tbWVudDYzNDE2NTE1NA==,56925856,2020-05-26T17:26:34Z,2020-05-26T17:26:34Z,CONTRIBUTOR,"@kefirbandi I didn't want to step on your toes, but I'm happy to put in a PR to fix the typo. :) ","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,623751213
https://github.com/pydata/xarray/pull/4089#issuecomment-633592183,https://api.github.com/repos/pydata/xarray/issues/4089,633592183,MDEyOklzc3VlQ29tbWVudDYzMzU5MjE4Mw==,56925856,2020-05-25T14:13:46Z,2020-05-25T14:13:46Z,CONTRIBUTOR,"> If you insist ;)
>
> ```
> da_a -= da_a.mean(dim=dim)
> ```
>
> is indeed marginally faster. As they are already aligned, we don't have to worry about this.
Sweet! On second thought, I might leave it for now...the sun is too nice today. Can always have it as a future PR or something. :)","{""total_count"": 1, ""+1"": 1, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,623751213
https://github.com/pydata/xarray/pull/4089#issuecomment-633582528,https://api.github.com/repos/pydata/xarray/issues/4089,633582528,MDEyOklzc3VlQ29tbWVudDYzMzU4MjUyOA==,56925856,2020-05-25T13:50:08Z,2020-05-25T13:50:08Z,CONTRIBUTOR,"One more thing actually, is there an argument for not defining `da_a_std` and `demeaned_da_a` and just performing the operations in place? Defining these variables makes the code more readable but in https://github.com/pydata/xarray/pull/3550#discussion_r355157809 and https://github.com/pydata/xarray/pull/3550#discussion_r355157888 the reviewer suggests this is inefficient?","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,623751213
https://github.com/pydata/xarray/pull/4089#issuecomment-633456698,https://api.github.com/repos/pydata/xarray/issues/4089,633456698,MDEyOklzc3VlQ29tbWVudDYzMzQ1NjY5OA==,56925856,2020-05-25T08:44:36Z,2020-05-25T10:55:29Z,CONTRIBUTOR,"> Could you also add a test for the `TypeError`?
>
> ```python
> with raises_regex(TypeError, ""Only xr.DataArray is supported""):
> xr.corr(xr.Dataset(), xr.Dataset())
> ```
Where do you mean sorry? Isn't this already there in corr()?
```python3
if any(not isinstance(arr, (Variable, DataArray)) for arr in [da_a, da_b]):
raise TypeError(
""Only xr.DataArray and xr.Variable are supported.""
""Given {}."".format([type(arr) for arr in [da_a, da_b]])
)
```
**EDIT:** Scratch that, I get what you mean :)","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,623751213
https://github.com/pydata/xarray/pull/4089#issuecomment-633441816,https://api.github.com/repos/pydata/xarray/issues/4089,633441816,MDEyOklzc3VlQ29tbWVudDYzMzQ0MTgxNg==,56925856,2020-05-25T08:11:05Z,2020-05-25T08:12:48Z,CONTRIBUTOR,"Cheers ! I've got a day off today so I'll do another pass through the changes and see if there's any low-hanging fruit I can improve (in addition to `np.random`, `_cov_corr` internal methods and maybe `apply_ufunc()` ) :) ","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,623751213
https://github.com/pydata/xarray/pull/4089#issuecomment-633286352,https://api.github.com/repos/pydata/xarray/issues/4089,633286352,MDEyOklzc3VlQ29tbWVudDYzMzI4NjM1Mg==,56925856,2020-05-24T19:49:30Z,2020-05-24T19:56:24Z,CONTRIBUTOR,"One problem I came across here is that `pandas` automatically ignores 'np.nan' values in any `corr` or `cov` calculation. This is hard-coded into the package and there's no `skipna=False` option sadly, so what I've done in the tests is to use the `numpy` implementation which pandas is built on (see, for example [here](https://github.com/pandas-dev/pandas/blob/cb35d8a938c9222d903482d2f66c62fece5a7aae/pandas/core/nanops.py#L1325)).
Current tests implemented are (in pseudocode...):
- [x] `assert_allclose(xr.cov(a, b) / (a.std() * b.std()), xr.corr(a, b))`
- [x] `assert_allclose(xr.cov(a,a)*(N-1), ((a - a.mean())**2).sum())`
- [x] For the example in my previous comment, I now have a loop over all values of `(a,x)` to reconstruct the covariance / correlation matrix, and check it with an `assert_allclose(...)`.
- [x] Add more test arrays, with/without `np.nans` -- **done**
@keewis I tried reading the Hypothesis docs and got a bit overwhelmed, so I've stuck with example-based tests for now.","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,623751213
https://github.com/pydata/xarray/pull/4089#issuecomment-633213547,https://api.github.com/repos/pydata/xarray/issues/4089,633213547,MDEyOklzc3VlQ29tbWVudDYzMzIxMzU0Nw==,56925856,2020-05-24T10:59:43Z,2020-05-24T11:00:53Z,CONTRIBUTOR,"The current problem is that we can't use Pandas to fully test `xr.cov()` or `xr.corr()` because once you convert the `DataArrays` to a `series` or a `dataframe` for testing, you can't easily index them with a `dim` parameter. See @r-beer 's comment here https://github.com/pydata/xarray/pull/3550#issuecomment-557895005.
As such, I think it maybe just makes sense to test a few low-dimensional cases? Eg
```python3
>>> da_a = xr.DataArray(
np.random.random((3, 21, 4)),
coords={""time"": pd.date_range(""2000-01-01"", freq=""1D"", periods=21)},
dims=(""a"", ""time"", ""x""),
)
>>> da_b = xr.DataArray(
np.random.random((3, 21, 4)),
coords={""time"": pd.date_range(""2000-01-01"", freq=""1D"", periods=21)},
dims=(""a"", ""time"", ""x""),
)
>>> xr.cov(da_a, da_b, 'time')
array([[-0.01824046, 0.00373796, -0.00601642, -0.00108818],
[ 0.00686132, -0.02680119, -0.00639433, -0.00868691],
[-0.00889806, 0.02622817, -0.01022208, -0.00101257]])
Dimensions without coordinates: a, x
>>> xr.cov(da_a, da_b, 'time').sel(a=0,x=0)
array(-0.01824046)
>>> da_a.sel(a=0,x=0).to_series().cov(da_b.sel(a=0,x=0).to_series())
-0.018240458880158048
```
So, while it's easy to check that a few individual points from `xr.cov()` agree with the pandas implementation, it would require a loop over `(a,x)` in order to check all of the points for this example. Do people have thoughts about this?
I think it would also make sense to have some test cases where we don't use Pandas at all, but we specify the output manually?
```python3
>>> da_a = xr.DataArray([[1, 2], [1, np.nan]], dims=[""x"", ""time""])
>>> expected = [1, np.nan]
>>> actual = xr.corr(da_a, da_a, dim='time')
>>> assert_allclose(actual, expected)
```
Does this seem like a good way forward? ","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,623751213