html_url,issue_url,id,node_id,user,created_at,updated_at,author_association,body,reactions,performed_via_github_app,issue https://github.com/pydata/xarray/pull/4089#issuecomment-634165154,https://api.github.com/repos/pydata/xarray/issues/4089,634165154,MDEyOklzc3VlQ29tbWVudDYzNDE2NTE1NA==,56925856,2020-05-26T17:26:34Z,2020-05-26T17:26:34Z,CONTRIBUTOR,"@kefirbandi I didn't want to step on your toes, but I'm happy to put in a PR to fix the typo. :) ","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,623751213 https://github.com/pydata/xarray/pull/4089#issuecomment-633592183,https://api.github.com/repos/pydata/xarray/issues/4089,633592183,MDEyOklzc3VlQ29tbWVudDYzMzU5MjE4Mw==,56925856,2020-05-25T14:13:46Z,2020-05-25T14:13:46Z,CONTRIBUTOR,"> If you insist ;) > > ``` > da_a -= da_a.mean(dim=dim) > ``` > > is indeed marginally faster. As they are already aligned, we don't have to worry about this. Sweet! On second thought, I might leave it for now...the sun is too nice today. Can always have it as a future PR or something. :)","{""total_count"": 1, ""+1"": 1, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,623751213 https://github.com/pydata/xarray/pull/4089#issuecomment-633582528,https://api.github.com/repos/pydata/xarray/issues/4089,633582528,MDEyOklzc3VlQ29tbWVudDYzMzU4MjUyOA==,56925856,2020-05-25T13:50:08Z,2020-05-25T13:50:08Z,CONTRIBUTOR,"One more thing actually, is there an argument for not defining `da_a_std` and `demeaned_da_a` and just performing the operations in place? Defining these variables makes the code more readable but in https://github.com/pydata/xarray/pull/3550#discussion_r355157809 and https://github.com/pydata/xarray/pull/3550#discussion_r355157888 the reviewer suggests this is inefficient?","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,623751213 https://github.com/pydata/xarray/pull/4089#issuecomment-633456698,https://api.github.com/repos/pydata/xarray/issues/4089,633456698,MDEyOklzc3VlQ29tbWVudDYzMzQ1NjY5OA==,56925856,2020-05-25T08:44:36Z,2020-05-25T10:55:29Z,CONTRIBUTOR,"> Could you also add a test for the `TypeError`? > > ```python > with raises_regex(TypeError, ""Only xr.DataArray is supported""): > xr.corr(xr.Dataset(), xr.Dataset()) > ``` Where do you mean sorry? Isn't this already there in corr()? ```python3 if any(not isinstance(arr, (Variable, DataArray)) for arr in [da_a, da_b]): raise TypeError( ""Only xr.DataArray and xr.Variable are supported."" ""Given {}."".format([type(arr) for arr in [da_a, da_b]]) ) ``` **EDIT:** Scratch that, I get what you mean :)","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,623751213 https://github.com/pydata/xarray/pull/4089#issuecomment-633441816,https://api.github.com/repos/pydata/xarray/issues/4089,633441816,MDEyOklzc3VlQ29tbWVudDYzMzQ0MTgxNg==,56925856,2020-05-25T08:11:05Z,2020-05-25T08:12:48Z,CONTRIBUTOR,"Cheers ! I've got a day off today so I'll do another pass through the changes and see if there's any low-hanging fruit I can improve (in addition to `np.random`, `_cov_corr` internal methods and maybe `apply_ufunc()` ) :) ","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,623751213 https://github.com/pydata/xarray/pull/4089#issuecomment-633286352,https://api.github.com/repos/pydata/xarray/issues/4089,633286352,MDEyOklzc3VlQ29tbWVudDYzMzI4NjM1Mg==,56925856,2020-05-24T19:49:30Z,2020-05-24T19:56:24Z,CONTRIBUTOR,"One problem I came across here is that `pandas` automatically ignores 'np.nan' values in any `corr` or `cov` calculation. This is hard-coded into the package and there's no `skipna=False` option sadly, so what I've done in the tests is to use the `numpy` implementation which pandas is built on (see, for example [here](https://github.com/pandas-dev/pandas/blob/cb35d8a938c9222d903482d2f66c62fece5a7aae/pandas/core/nanops.py#L1325)). Current tests implemented are (in pseudocode...): - [x] `assert_allclose(xr.cov(a, b) / (a.std() * b.std()), xr.corr(a, b))` - [x] `assert_allclose(xr.cov(a,a)*(N-1), ((a - a.mean())**2).sum())` - [x] For the example in my previous comment, I now have a loop over all values of `(a,x)` to reconstruct the covariance / correlation matrix, and check it with an `assert_allclose(...)`. - [x] Add more test arrays, with/without `np.nans` -- **done** @keewis I tried reading the Hypothesis docs and got a bit overwhelmed, so I've stuck with example-based tests for now.","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,623751213 https://github.com/pydata/xarray/pull/4089#issuecomment-633213547,https://api.github.com/repos/pydata/xarray/issues/4089,633213547,MDEyOklzc3VlQ29tbWVudDYzMzIxMzU0Nw==,56925856,2020-05-24T10:59:43Z,2020-05-24T11:00:53Z,CONTRIBUTOR,"The current problem is that we can't use Pandas to fully test `xr.cov()` or `xr.corr()` because once you convert the `DataArrays` to a `series` or a `dataframe` for testing, you can't easily index them with a `dim` parameter. See @r-beer 's comment here https://github.com/pydata/xarray/pull/3550#issuecomment-557895005. As such, I think it maybe just makes sense to test a few low-dimensional cases? Eg ```python3 >>> da_a = xr.DataArray( np.random.random((3, 21, 4)), coords={""time"": pd.date_range(""2000-01-01"", freq=""1D"", periods=21)}, dims=(""a"", ""time"", ""x""), ) >>> da_b = xr.DataArray( np.random.random((3, 21, 4)), coords={""time"": pd.date_range(""2000-01-01"", freq=""1D"", periods=21)}, dims=(""a"", ""time"", ""x""), ) >>> xr.cov(da_a, da_b, 'time') array([[-0.01824046, 0.00373796, -0.00601642, -0.00108818], [ 0.00686132, -0.02680119, -0.00639433, -0.00868691], [-0.00889806, 0.02622817, -0.01022208, -0.00101257]]) Dimensions without coordinates: a, x >>> xr.cov(da_a, da_b, 'time').sel(a=0,x=0) array(-0.01824046) >>> da_a.sel(a=0,x=0).to_series().cov(da_b.sel(a=0,x=0).to_series()) -0.018240458880158048 ``` So, while it's easy to check that a few individual points from `xr.cov()` agree with the pandas implementation, it would require a loop over `(a,x)` in order to check all of the points for this example. Do people have thoughts about this? I think it would also make sense to have some test cases where we don't use Pandas at all, but we specify the output manually? ```python3 >>> da_a = xr.DataArray([[1, 2], [1, np.nan]], dims=[""x"", ""time""]) >>> expected = [1, np.nan] >>> actual = xr.corr(da_a, da_a, dim='time') >>> assert_allclose(actual, expected) ``` Does this seem like a good way forward? ","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,623751213