github: issue_comments: 4 rows where author_association = "CONTRIBUTOR", issue = 904153867 and user = 56925856 sorted by updated

4 rows where author_association = "CONTRIBUTOR", issue = 904153867 and user = 56925856 sorted by updated_at descending

Search:

descending

id	html_url	issue_url	node_id	user	created_at	updated_at ▲	author_association	body	reactions	issue
850843957	https://github.com/pydata/xarray/pull/5390#issuecomment-850843957	https://api.github.com/repos/pydata/xarray/issues/5390	MDEyOklzc3VlQ29tbWVudDg1MDg0Mzk1Nw==	AndrewILWilliams 56925856	2021-05-29T14:37:48Z	2021-05-31T10:27:06Z	CONTRIBUTOR	@willirath this is cool, but I think it doesn't explain why the tests fail. Currently `da_a.mean()` and the `da_b.mean()` calls do know about each other's missing data! That's what we're doing in these lines. @dcherian, I think I've got it to work, but you need to account for the length(s) of the dimension you're calculating the correlation over. (i.e. `(da-da.mean('time')).sum('time')` is not the same as `da.sum('time') - da.mean('time')` because you should actually do `da.sum('time') - da.mean('time')length_of_time_dim`) This latest commit does this, but I'm not sure whether the added complication is worth it yet? Thoughts welcome. ```python3 def _mean(da): return (da.sum(dim=dim, skipna=True, min_count=1) / (valid_count)) dim_length = da_a.notnull().sum(dim=dim, skipna=True) def _mean_detrended_term(da): return (dim_length da / (valid_count)) cov = _mean(da_a * da_b) - _mean_detrended_term(da_a.mean(dim=dim) * da_b.mean(dim=dim)) ```	{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	Improvements to lazy behaviour of `xr.cov()` and `xr.corr()` 904153867
850690985	https://github.com/pydata/xarray/pull/5390#issuecomment-850690985	https://api.github.com/repos/pydata/xarray/issues/5390	MDEyOklzc3VlQ29tbWVudDg1MDY5MDk4NQ==	AndrewILWilliams 56925856	2021-05-28T21:43:52Z	2021-05-28T21:44:12Z	CONTRIBUTOR	is it just `cov = _mean(da_a * da_b) - da_a.mean(dim=dim) * da_b.mean(dim=dim)` I think you'd still have to normalize the second term by `1 / (valid_count)`. However, I just tried both of these approaches and neither pass the test suite, so we may need to do more thinking...	{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	Improvements to lazy behaviour of `xr.cov()` and `xr.corr()` 904153867
850556738	https://github.com/pydata/xarray/pull/5390#issuecomment-850556738	https://api.github.com/repos/pydata/xarray/issues/5390	MDEyOklzc3VlQ29tbWVudDg1MDU1NjczOA==	AndrewILWilliams 56925856	2021-05-28T17:12:52Z	2021-05-28T17:14:08Z	CONTRIBUTOR	@willirath this is great stuff, thanks again! So generally it looks like the graph is more efficient when doing operations of the form: `python3 (X * Y).mean('time') - (X.mean('time') * Y.mean('time'))` than doing `python3 ((X - X.mean('time')) * (Y-Y.mean('time'))).mean('time')` or like what I've implemented (see screenshot)? ```python3 intermediate = (X * Y) - (X.mean('time') * Y.mean('time')) intermediate.mean('time') ``` If so, it seems like the most efficient(?) way to do the computation in _cov_corr() is to combine it all into one line? I can't think of how to do this though...	{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	Improvements to lazy behaviour of `xr.cov()` and `xr.corr()` 904153867
850276619	https://github.com/pydata/xarray/pull/5390#issuecomment-850276619	https://api.github.com/repos/pydata/xarray/issues/5390	MDEyOklzc3VlQ29tbWVudDg1MDI3NjYxOQ==	AndrewILWilliams 56925856	2021-05-28T09:15:30Z	2021-05-28T09:17:48Z	CONTRIBUTOR	@willirath , thanks for your example notebook! I'm still trying to get my head around this a bit though. Say you have `da_a` and `da_b` defined as: ```python3 da_a = xr.DataArray( np.array([[1, 2, 3, 4], [1, 0.1, 0.2, 0.3], [2, 3.2, 0.6, 1.8]]), dims=("space", "time"), coords=[ ("space", ["IA", "IL", "IN"]), ("time", pd.date_range("2000-01-01", freq="1D", periods=4)), ], ).chunk() da_b = xr.DataArray( np.array([[0.2, 0.4, 0.6, 2], [15, 10, 5, 1], [1, 3.2, np.nan, 1.8]]), dims=("space", "time"), coords=[ ("space", ["IA", "IL", "IN"]), ("time", pd.date_range("2000-01-01", freq="1D", periods=4)), ], ).chunk() ``` The original computation in `_cov_corr` has a graph something like: Whereas my alteration now has a graph more like this: Am I correct in thinking that this is a 'better' computational graph? Because the original chunks are not passed onto later points in the computation?	{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	Improvements to lazy behaviour of `xr.cov()` and `xr.corr()` 904153867

Advanced export

JSON shape: default, array, newline-delimited, object

CREATE TABLE [issue_comments] (
   [html_url] TEXT,
   [issue_url] TEXT,
   [id] INTEGER PRIMARY KEY,
   [node_id] TEXT,
   [user] INTEGER REFERENCES [users]([id]),
   [created_at] TEXT,
   [updated_at] TEXT,
   [author_association] TEXT,
   [body] TEXT,
   [reactions] TEXT,
   [performed_via_github_app] TEXT,
   [issue] INTEGER REFERENCES [issues]([id])
);
CREATE INDEX [idx_issue_comments_issue]
    ON [issue_comments] ([issue]);
CREATE INDEX [idx_issue_comments_user]
    ON [issue_comments] ([user]);