github: issue_comments: 7 rows where author_association = "CONTRIBUTOR", issue = 623751213 and user = 56925856 sorted by updated

7 rows where author_association = "CONTRIBUTOR", issue = 623751213 and user = 56925856 sorted by updated_at descending

Search:

descending

id	html_url	issue_url	node_id	user	created_at	updated_at ▲	author_association	body	reactions	issue
634165154	https://github.com/pydata/xarray/pull/4089#issuecomment-634165154	https://api.github.com/repos/pydata/xarray/issues/4089	MDEyOklzc3VlQ29tbWVudDYzNDE2NTE1NA==	AndrewILWilliams 56925856	2020-05-26T17:26:34Z	2020-05-26T17:26:34Z	CONTRIBUTOR	@kefirbandi I didn't want to step on your toes, but I'm happy to put in a PR to fix the typo. :)	{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	xr.cov() and xr.corr() 623751213
633592183	https://github.com/pydata/xarray/pull/4089#issuecomment-633592183	https://api.github.com/repos/pydata/xarray/issues/4089	MDEyOklzc3VlQ29tbWVudDYzMzU5MjE4Mw==	AndrewILWilliams 56925856	2020-05-25T14:13:46Z	2020-05-25T14:13:46Z	CONTRIBUTOR	If you insist ;) `da_a -= da_a.mean(dim=dim)` is indeed marginally faster. As they are already aligned, we don't have to worry about this. Sweet! On second thought, I might leave it for now...the sun is too nice today. Can always have it as a future PR or something. :)	{ "total_count": 1, "+1": 1, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	xr.cov() and xr.corr() 623751213
633582528	https://github.com/pydata/xarray/pull/4089#issuecomment-633582528	https://api.github.com/repos/pydata/xarray/issues/4089	MDEyOklzc3VlQ29tbWVudDYzMzU4MjUyOA==	AndrewILWilliams 56925856	2020-05-25T13:50:08Z	2020-05-25T13:50:08Z	CONTRIBUTOR	One more thing actually, is there an argument for not defining `da_a_std` and `demeaned_da_a` and just performing the operations in place? Defining these variables makes the code more readable but in https://github.com/pydata/xarray/pull/3550#discussion_r355157809 and https://github.com/pydata/xarray/pull/3550#discussion_r355157888 the reviewer suggests this is inefficient?	{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	xr.cov() and xr.corr() 623751213
633456698	https://github.com/pydata/xarray/pull/4089#issuecomment-633456698	https://api.github.com/repos/pydata/xarray/issues/4089	MDEyOklzc3VlQ29tbWVudDYzMzQ1NjY5OA==	AndrewILWilliams 56925856	2020-05-25T08:44:36Z	2020-05-25T10:55:29Z	CONTRIBUTOR	Could you also add a test for the `TypeError`? `python with raises_regex(TypeError, "Only xr.DataArray is supported"): xr.corr(xr.Dataset(), xr.Dataset())` Where do you mean sorry? Isn't this already there in corr()? `python3 if any(not isinstance(arr, (Variable, DataArray)) for arr in [da_a, da_b]): raise TypeError( "Only xr.DataArray and xr.Variable are supported." "Given {}.".format([type(arr) for arr in [da_a, da_b]]) )` EDIT: Scratch that, I get what you mean :)	{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	xr.cov() and xr.corr() 623751213
633441816	https://github.com/pydata/xarray/pull/4089#issuecomment-633441816	https://api.github.com/repos/pydata/xarray/issues/4089	MDEyOklzc3VlQ29tbWVudDYzMzQ0MTgxNg==	AndrewILWilliams 56925856	2020-05-25T08:11:05Z	2020-05-25T08:12:48Z	CONTRIBUTOR	Cheers ! I've got a day off today so I'll do another pass through the changes and see if there's any low-hanging fruit I can improve (in addition to `np.random`, `_cov_corr` internal methods and maybe `apply_ufunc()` ) :)	{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	xr.cov() and xr.corr() 623751213
633286352	https://github.com/pydata/xarray/pull/4089#issuecomment-633286352	https://api.github.com/repos/pydata/xarray/issues/4089	MDEyOklzc3VlQ29tbWVudDYzMzI4NjM1Mg==	AndrewILWilliams 56925856	2020-05-24T19:49:30Z	2020-05-24T19:56:24Z	CONTRIBUTOR	One problem I came across here is that `pandas` automatically ignores 'np.nan' values in any `corr` or `cov` calculation. This is hard-coded into the package and there's no `skipna=False` option sadly, so what I've done in the tests is to use the `numpy` implementation which pandas is built on (see, for example here). Current tests implemented are (in pseudocode...): - [x] `assert_allclose(xr.cov(a, b) / (a.std() * b.std()), xr.corr(a, b))` - [x] `assert_allclose(xr.cov(a,a)(N-1), ((a - a.mean())2).sum())` - [x] For the example in my previous comment, I now have a loop over all values of `(a,x)` to reconstruct the covariance / correlation matrix, and check it with an `assert_allclose(...)`. - [x] Add more test arrays, with/without `np.nans` -- done* @keewis I tried reading the Hypothesis docs and got a bit overwhelmed, so I've stuck with example-based tests for now.	{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	xr.cov() and xr.corr() 623751213
633213547	https://github.com/pydata/xarray/pull/4089#issuecomment-633213547	https://api.github.com/repos/pydata/xarray/issues/4089	MDEyOklzc3VlQ29tbWVudDYzMzIxMzU0Nw==	AndrewILWilliams 56925856	2020-05-24T10:59:43Z	2020-05-24T11:00:53Z	CONTRIBUTOR	The current problem is that we can't use Pandas to fully test `xr.cov()` or `xr.corr()` because once you convert the `DataArrays` to a `series` or a `dataframe` for testing, you can't easily index them with a `dim` parameter. See @r-beer 's comment here https://github.com/pydata/xarray/pull/3550#issuecomment-557895005. As such, I think it maybe just makes sense to test a few low-dimensional cases? Eg ```python3 da_a = xr.DataArray( np.random.random((3, 21, 4)), coords={"time": pd.date_range("2000-01-01", freq="1D", periods=21)}, dims=("a", "time", "x"), ) da_b = xr.DataArray( np.random.random((3, 21, 4)), coords={"time": pd.date_range("2000-01-01", freq="1D", periods=21)}, dims=("a", "time", "x"), ) xr.cov(da_a, da_b, 'time') <xarray.DataArray (a: 3, x: 4)> array([[-0.01824046, 0.00373796, -0.00601642, -0.00108818], [ 0.00686132, -0.02680119, -0.00639433, -0.00868691], [-0.00889806, 0.02622817, -0.01022208, -0.00101257]]) Dimensions without coordinates: a, x xr.cov(da_a, da_b, 'time').sel(a=0,x=0) <xarray.DataArray ()> array(-0.01824046) da_a.sel(a=0,x=0).to_series().cov(da_b.sel(a=0,x=0).to_series()) -0.018240458880158048 ``` So, while it's easy to check that a few individual points from `xr.cov()` agree with the pandas implementation, it would require a loop over `(a,x)` in order to check all of the points for this example. Do people have thoughts about this? I think it would also make sense to have some test cases where we don't use Pandas at all, but we specify the output manually? ```python3 da_a = xr.DataArray([[1, 2], [1, np.nan]], dims=["x", "time"]) expected = [1, np.nan] actual = xr.corr(da_a, da_a, dim='time') assert_allclose(actual, expected) ``` Does this seem like a good way forward?	{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	xr.cov() and xr.corr() 623751213

Advanced export

JSON shape: default, array, newline-delimited, object

CREATE TABLE [issue_comments] (
   [html_url] TEXT,
   [issue_url] TEXT,
   [id] INTEGER PRIMARY KEY,
   [node_id] TEXT,
   [user] INTEGER REFERENCES [users]([id]),
   [created_at] TEXT,
   [updated_at] TEXT,
   [author_association] TEXT,
   [body] TEXT,
   [reactions] TEXT,
   [performed_via_github_app] TEXT,
   [issue] INTEGER REFERENCES [issues]([id])
);
CREATE INDEX [idx_issue_comments_issue]
    ON [issue_comments] ([issue]);
CREATE INDEX [idx_issue_comments_user]
    ON [issue_comments] ([user]);