html_url,issue_url,id,node_id,user,created_at,updated_at,author_association,body,reactions,performed_via_github_app,issue https://github.com/pydata/xarray/pull/4089#issuecomment-634200431,https://api.github.com/repos/pydata/xarray/issues/4089,634200431,MDEyOklzc3VlQ29tbWVudDYzNDIwMDQzMQ==,1277781,2020-05-26T18:31:31Z,2020-05-26T18:31:31Z,CONTRIBUTOR,@AndrewWilliams3142 I see. Thanks.,"{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,623751213 https://github.com/pydata/xarray/pull/4089#issuecomment-634165154,https://api.github.com/repos/pydata/xarray/issues/4089,634165154,MDEyOklzc3VlQ29tbWVudDYzNDE2NTE1NA==,56925856,2020-05-26T17:26:34Z,2020-05-26T17:26:34Z,CONTRIBUTOR,"@kefirbandi I didn't want to step on your toes, but I'm happy to put in a PR to fix the typo. :) ","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,623751213 https://github.com/pydata/xarray/pull/4089#issuecomment-634157768,https://api.github.com/repos/pydata/xarray/issues/4089,634157768,MDEyOklzc3VlQ29tbWVudDYzNDE1Nzc2OA==,1277781,2020-05-26T17:12:41Z,2020-05-26T17:12:41Z,CONTRIBUTOR,"Well, actually I was thinking, that correcting it for someone who is working on the code on a daily basis is ~30 seconds. For me, I think, it would be quite a bit of overhead for a single character...","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,623751213 https://github.com/pydata/xarray/pull/4089#issuecomment-633922774,https://api.github.com/repos/pydata/xarray/issues/4089,633922774,MDEyOklzc3VlQ29tbWVudDYzMzkyMjc3NA==,14808389,2020-05-26T09:43:29Z,2020-05-26T09:43:29Z,MEMBER,thanks. Do you want to put in a PR fixing that?,"{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,623751213 https://github.com/pydata/xarray/pull/4089#issuecomment-633921230,https://api.github.com/repos/pydata/xarray/issues/4089,633921230,MDEyOklzc3VlQ29tbWVudDYzMzkyMTIzMA==,1277781,2020-05-26T09:40:12Z,2020-05-26T09:40:12Z,CONTRIBUTOR,"Just a small comment: in the docs (http://xarray.pydata.org/en/latest/generated/xarray.cov.html#xarray.cov) there is a typo: da_a is declared twice, the second should really be da_b.","{""total_count"": 1, ""+1"": 1, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,623751213 https://github.com/pydata/xarray/pull/4089#issuecomment-633651803,https://api.github.com/repos/pydata/xarray/issues/4089,633651803,MDEyOklzc3VlQ29tbWVudDYzMzY1MTgwMw==,5635139,2020-05-25T16:55:26Z,2020-05-25T16:55:26Z,MEMBER,"Awesome @AndrewWilliams3142 ! Very excited we have this. Thanks for the review @mathause Hitting merge; any other feedback is welcome and we can iterate. ","{""total_count"": 3, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 2, ""rocket"": 1, ""eyes"": 0}",,623751213 https://github.com/pydata/xarray/pull/4089#issuecomment-633592183,https://api.github.com/repos/pydata/xarray/issues/4089,633592183,MDEyOklzc3VlQ29tbWVudDYzMzU5MjE4Mw==,56925856,2020-05-25T14:13:46Z,2020-05-25T14:13:46Z,CONTRIBUTOR,"> If you insist ;) > > ``` > da_a -= da_a.mean(dim=dim) > ``` > > is indeed marginally faster. As they are already aligned, we don't have to worry about this. Sweet! On second thought, I might leave it for now...the sun is too nice today. Can always have it as a future PR or something. :)","{""total_count"": 1, ""+1"": 1, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,623751213 https://github.com/pydata/xarray/pull/4089#issuecomment-633590099,https://api.github.com/repos/pydata/xarray/issues/4089,633590099,MDEyOklzc3VlQ29tbWVudDYzMzU5MDA5OQ==,10194086,2020-05-25T14:08:35Z,2020-05-25T14:08:35Z,MEMBER,"If you insist ;) ``` da_a -= da_a.mean(dim=dim) ``` is indeed marginally faster. As they are already aligned, we don't have to worry about this.","{""total_count"": 1, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 1, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,623751213 https://github.com/pydata/xarray/pull/4089#issuecomment-633146890,https://api.github.com/repos/pydata/xarray/issues/4089,633146890,MDEyOklzc3VlQ29tbWVudDYzMzE0Njg5MA==,24736507,2020-05-23T22:09:12Z,2020-05-25T13:55:30Z,NONE,"Hello @AndrewWilliams3142! Thanks for updating this PR. We checked the lines you've touched for [PEP 8](https://www.python.org/dev/peps/pep-0008) issues, and found: There are currently no PEP 8 issues detected in this Pull Request. Cheers! :beers: ##### Comment last updated at 2020-05-25 13:55:29 UTC","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,623751213 https://github.com/pydata/xarray/pull/4089#issuecomment-633582528,https://api.github.com/repos/pydata/xarray/issues/4089,633582528,MDEyOklzc3VlQ29tbWVudDYzMzU4MjUyOA==,56925856,2020-05-25T13:50:08Z,2020-05-25T13:50:08Z,CONTRIBUTOR,"One more thing actually, is there an argument for not defining `da_a_std` and `demeaned_da_a` and just performing the operations in place? Defining these variables makes the code more readable but in https://github.com/pydata/xarray/pull/3550#discussion_r355157809 and https://github.com/pydata/xarray/pull/3550#discussion_r355157888 the reviewer suggests this is inefficient?","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,623751213 https://github.com/pydata/xarray/pull/4089#issuecomment-633456698,https://api.github.com/repos/pydata/xarray/issues/4089,633456698,MDEyOklzc3VlQ29tbWVudDYzMzQ1NjY5OA==,56925856,2020-05-25T08:44:36Z,2020-05-25T10:55:29Z,CONTRIBUTOR,"> Could you also add a test for the `TypeError`? > > ```python > with raises_regex(TypeError, ""Only xr.DataArray is supported""): > xr.corr(xr.Dataset(), xr.Dataset()) > ``` Where do you mean sorry? Isn't this already there in corr()? ```python3 if any(not isinstance(arr, (Variable, DataArray)) for arr in [da_a, da_b]): raise TypeError( ""Only xr.DataArray and xr.Variable are supported."" ""Given {}."".format([type(arr) for arr in [da_a, da_b]]) ) ``` **EDIT:** Scratch that, I get what you mean :)","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,623751213 https://github.com/pydata/xarray/pull/4089#issuecomment-633449839,https://api.github.com/repos/pydata/xarray/issues/4089,633449839,MDEyOklzc3VlQ29tbWVudDYzMzQ0OTgzOQ==,10194086,2020-05-25T08:29:22Z,2020-05-25T08:29:22Z,MEMBER,"Could you also add a test for the `TypeError`? ```python with raises_regex(TypeError, ""Only xr.DataArray is supported""): xr.corr(xr.Dataset(), xr.Dataset()) ``` ","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,623751213 https://github.com/pydata/xarray/pull/4089#issuecomment-633336577,https://api.github.com/repos/pydata/xarray/issues/4089,633336577,MDEyOklzc3VlQ29tbWVudDYzMzMzNjU3Nw==,12229877,2020-05-25T01:39:49Z,2020-05-25T08:29:16Z,CONTRIBUTOR,"> Also, I don't think there is a hypothesis.extra.xarray module, yet. Any comments on that, @Zac-HD? Not yet, though I'm interested in collaborations to write one! Our existing Numpy + Pandas integrations, along with the `st.builds()` and `@st.composite` strategies, make it reasonably straightforward-if-tedious to compose the objects you want in exquisite detail... the real challenge is to design a clean high-level API which lets you control only the parts you care about. Plausibly we have to let go of that last part to make this work, and just point people at the low-level tools if high-level isn't working. [This third-party module](https://github.com/uber/hypothesis-gufunc/blob/master/hypothesis_gufunc/extra/xr.py) is unidiomatic in that it doesn't have the API design above, but I believe it works. rdturnermtl has a history of great features; we eventually got idiomatic high-performance gufunc shapes upstream and I'm confident we'll get Xarray support eventually too... and sooner if there are people who can help design it :smile: Just \@-mention me again if this comes up, or I won't be notified.","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,623751213 https://github.com/pydata/xarray/pull/4089#issuecomment-633441816,https://api.github.com/repos/pydata/xarray/issues/4089,633441816,MDEyOklzc3VlQ29tbWVudDYzMzQ0MTgxNg==,56925856,2020-05-25T08:11:05Z,2020-05-25T08:12:48Z,CONTRIBUTOR,"Cheers ! I've got a day off today so I'll do another pass through the changes and see if there's any low-hanging fruit I can improve (in addition to `np.random`, `_cov_corr` internal methods and maybe `apply_ufunc()` ) :) ","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,623751213 https://github.com/pydata/xarray/pull/4089#issuecomment-633393615,https://api.github.com/repos/pydata/xarray/issues/4089,633393615,MDEyOklzc3VlQ29tbWVudDYzMzM5MzYxNQ==,5635139,2020-05-25T06:03:24Z,2020-05-25T06:03:24Z,MEMBER,"This looks great! I think it's fine to have a simple implementation like this, even if it's not perfectly efficient. Well done for getting something working @AndrewWilliams3142 . For the future: let's move away from using `np.random` in docstrings, since they're not reproducible, and will never pass a doctest. Anything remaining before merging?","{""total_count"": 1, ""+1"": 1, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,623751213 https://github.com/pydata/xarray/pull/4089#issuecomment-633310925,https://api.github.com/repos/pydata/xarray/issues/4089,633310925,MDEyOklzc3VlQ29tbWVudDYzMzMxMDkyNQ==,14808389,2020-05-24T22:38:54Z,2020-05-24T22:38:54Z,MEMBER,"no worries about `hypothesis`, that's something we can add in a new PR. Also, I don't think there is a `hypothesis.extra.xarray` module, yet. Any comments on that, @Zac-HD?","{""total_count"": 1, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 1, ""rocket"": 0, ""eyes"": 0}",,623751213 https://github.com/pydata/xarray/pull/4089#issuecomment-633299554,https://api.github.com/repos/pydata/xarray/issues/4089,633299554,MDEyOklzc3VlQ29tbWVudDYzMzI5OTU1NA==,10194086,2020-05-24T21:04:03Z,2020-05-24T21:04:03Z,MEMBER,"Currently `corr` needs to sanitize the inputs twice, which will be inefficient. One way around this is to define an internal method which can do both, depending on a `method` keyword (no need to write extra tests for this IMHO): ```python def corr(da_a, da_b, dim=None, ddof=0): return _cov_corr(da_a, da_b, dim=None, ddof=0, method=""corr"") def cov(da_a, da_b, dim=None, ddof=0): return _cov_corr(da_a, da_b, dim=None, ddof=0, method=""cov"") def _cov_corr(da_a, da_b, dim=None, ddof=0, method=None): # compute cov if method = ""cov"": return cov # compute corr return corr ``` Maybe you could use `xr.apply_ufunc` instead of looping in the tests (might be overkill).","{""total_count"": 1, ""+1"": 1, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,623751213 https://github.com/pydata/xarray/pull/4089#issuecomment-633286352,https://api.github.com/repos/pydata/xarray/issues/4089,633286352,MDEyOklzc3VlQ29tbWVudDYzMzI4NjM1Mg==,56925856,2020-05-24T19:49:30Z,2020-05-24T19:56:24Z,CONTRIBUTOR,"One problem I came across here is that `pandas` automatically ignores 'np.nan' values in any `corr` or `cov` calculation. This is hard-coded into the package and there's no `skipna=False` option sadly, so what I've done in the tests is to use the `numpy` implementation which pandas is built on (see, for example [here](https://github.com/pandas-dev/pandas/blob/cb35d8a938c9222d903482d2f66c62fece5a7aae/pandas/core/nanops.py#L1325)). Current tests implemented are (in pseudocode...): - [x] `assert_allclose(xr.cov(a, b) / (a.std() * b.std()), xr.corr(a, b))` - [x] `assert_allclose(xr.cov(a,a)*(N-1), ((a - a.mean())**2).sum())` - [x] For the example in my previous comment, I now have a loop over all values of `(a,x)` to reconstruct the covariance / correlation matrix, and check it with an `assert_allclose(...)`. - [x] Add more test arrays, with/without `np.nans` -- **done** @keewis I tried reading the Hypothesis docs and got a bit overwhelmed, so I've stuck with example-based tests for now.","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,623751213 https://github.com/pydata/xarray/pull/4089#issuecomment-633216248,https://api.github.com/repos/pydata/xarray/issues/4089,633216248,MDEyOklzc3VlQ29tbWVudDYzMzIxNjI0OA==,14808389,2020-05-24T11:22:42Z,2020-05-24T12:09:51Z,MEMBER,"If you want to test individual values without reimplementing the function in the tests (which is what I suspect comparing with the result of `np.cov` would require), that might be the only way. If not, you could also check properties of covariance / correlation matrices, e.g. that `assert_allclose(xr.cov(a, b) / (a.std() * b.std()), xr.corr(a, b))` (I'm not sure if I remember that formula correctly) or that the diagonal of the auto-covariance matrix is the same as the variance of the array (with a 1D vector, not sure about more dimensions). If you decide to test using properties, you could also extend our small collection of tests using `hypothesis` (see #1846).","{""total_count"": 1, ""+1"": 1, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,623751213 https://github.com/pydata/xarray/pull/4089#issuecomment-633213547,https://api.github.com/repos/pydata/xarray/issues/4089,633213547,MDEyOklzc3VlQ29tbWVudDYzMzIxMzU0Nw==,56925856,2020-05-24T10:59:43Z,2020-05-24T11:00:53Z,CONTRIBUTOR,"The current problem is that we can't use Pandas to fully test `xr.cov()` or `xr.corr()` because once you convert the `DataArrays` to a `series` or a `dataframe` for testing, you can't easily index them with a `dim` parameter. See @r-beer 's comment here https://github.com/pydata/xarray/pull/3550#issuecomment-557895005. As such, I think it maybe just makes sense to test a few low-dimensional cases? Eg ```python3 >>> da_a = xr.DataArray( np.random.random((3, 21, 4)), coords={""time"": pd.date_range(""2000-01-01"", freq=""1D"", periods=21)}, dims=(""a"", ""time"", ""x""), ) >>> da_b = xr.DataArray( np.random.random((3, 21, 4)), coords={""time"": pd.date_range(""2000-01-01"", freq=""1D"", periods=21)}, dims=(""a"", ""time"", ""x""), ) >>> xr.cov(da_a, da_b, 'time') array([[-0.01824046, 0.00373796, -0.00601642, -0.00108818], [ 0.00686132, -0.02680119, -0.00639433, -0.00868691], [-0.00889806, 0.02622817, -0.01022208, -0.00101257]]) Dimensions without coordinates: a, x >>> xr.cov(da_a, da_b, 'time').sel(a=0,x=0) array(-0.01824046) >>> da_a.sel(a=0,x=0).to_series().cov(da_b.sel(a=0,x=0).to_series()) -0.018240458880158048 ``` So, while it's easy to check that a few individual points from `xr.cov()` agree with the pandas implementation, it would require a loop over `(a,x)` in order to check all of the points for this example. Do people have thoughts about this? I think it would also make sense to have some test cases where we don't use Pandas at all, but we specify the output manually? ```python3 >>> da_a = xr.DataArray([[1, 2], [1, np.nan]], dims=[""x"", ""time""]) >>> expected = [1, np.nan] >>> actual = xr.corr(da_a, da_a, dim='time') >>> assert_allclose(actual, expected) ``` Does this seem like a good way forward? ","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,623751213