issue_comments
10 rows where issue = 904153867 sorted by updated_at descending
This data as json, CSV (advanced)
Suggested facets: reactions, created_at (date), updated_at (date)
issue 1
- Improvements to lazy behaviour of `xr.cov()` and `xr.corr()` · 10 ✖
id | html_url | issue_url | node_id | user | created_at | updated_at ▲ | author_association | body | reactions | performed_via_github_app | issue |
---|---|---|---|---|---|---|---|---|---|---|---|
850843957 | https://github.com/pydata/xarray/pull/5390#issuecomment-850843957 | https://api.github.com/repos/pydata/xarray/issues/5390 | MDEyOklzc3VlQ29tbWVudDg1MDg0Mzk1Nw== | AndrewILWilliams 56925856 | 2021-05-29T14:37:48Z | 2021-05-31T10:27:06Z | CONTRIBUTOR | @willirath this is cool, but I think it doesn't explain why the tests fail. Currently @dcherian, I think I've got it to work, but you need to account for the length(s) of the dimension you're calculating the correlation over. (i.e. This latest commit does this, but I'm not sure whether the added complication is worth it yet? Thoughts welcome. ```python3 def _mean(da): return (da.sum(dim=dim, skipna=True, min_count=1) / (valid_count)) dim_length = da_a.notnull().sum(dim=dim, skipna=True) def _mean_detrended_term(da): return (dim_length * da / (valid_count)) cov = _mean(da_a * da_b) - _mean_detrended_term(da_a.mean(dim=dim) * da_b.mean(dim=dim)) ``` |
{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
Improvements to lazy behaviour of `xr.cov()` and `xr.corr()` 904153867 | |
849923230 | https://github.com/pydata/xarray/pull/5390#issuecomment-849923230 | https://api.github.com/repos/pydata/xarray/issues/5390 | MDEyOklzc3VlQ29tbWVudDg0OTkyMzIzMA== | pep8speaks 24736507 | 2021-05-27T20:34:28Z | 2021-05-29T14:35:16Z | NONE | Hello @AndrewWilliams3142! Thanks for updating this PR. We checked the lines you've touched for PEP 8 issues, and found: There are currently no PEP 8 issues detected in this Pull Request. Cheers! :beers: Comment last updated at 2021-05-29 14:35:16 UTC |
{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
Improvements to lazy behaviour of `xr.cov()` and `xr.corr()` 904153867 | |
850820173 | https://github.com/pydata/xarray/pull/5390#issuecomment-850820173 | https://api.github.com/repos/pydata/xarray/issues/5390 | MDEyOklzc3VlQ29tbWVudDg1MDgyMDE3Mw== | willirath 5700886 | 2021-05-29T11:51:50Z | 2021-05-29T11:51:59Z | CONTRIBUTOR | I think the problem with
is that the |
{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
Improvements to lazy behaviour of `xr.cov()` and `xr.corr()` 904153867 | |
850819741 | https://github.com/pydata/xarray/pull/5390#issuecomment-850819741 | https://api.github.com/repos/pydata/xarray/issues/5390 | MDEyOklzc3VlQ29tbWVudDg1MDgxOTc0MQ== | willirath 5700886 | 2021-05-29T11:48:02Z | 2021-05-29T11:48:02Z | CONTRIBUTOR | Shouldn't the following do?
|
{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
Improvements to lazy behaviour of `xr.cov()` and `xr.corr()` 904153867 | |
850690985 | https://github.com/pydata/xarray/pull/5390#issuecomment-850690985 | https://api.github.com/repos/pydata/xarray/issues/5390 | MDEyOklzc3VlQ29tbWVudDg1MDY5MDk4NQ== | AndrewILWilliams 56925856 | 2021-05-28T21:43:52Z | 2021-05-28T21:44:12Z | CONTRIBUTOR |
I think you'd still have to normalize the second term by |
{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
Improvements to lazy behaviour of `xr.cov()` and `xr.corr()` 904153867 | |
850650732 | https://github.com/pydata/xarray/pull/5390#issuecomment-850650732 | https://api.github.com/repos/pydata/xarray/issues/5390 | MDEyOklzc3VlQ29tbWVudDg1MDY1MDczMg== | dcherian 2448579 | 2021-05-28T20:19:56Z | 2021-05-28T20:20:52Z | MEMBER |
This second term looks very weird to me, it should be a no-op
is it just
|
{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
Improvements to lazy behaviour of `xr.cov()` and `xr.corr()` 904153867 | |
850556738 | https://github.com/pydata/xarray/pull/5390#issuecomment-850556738 | https://api.github.com/repos/pydata/xarray/issues/5390 | MDEyOklzc3VlQ29tbWVudDg1MDU1NjczOA== | AndrewILWilliams 56925856 | 2021-05-28T17:12:52Z | 2021-05-28T17:14:08Z | CONTRIBUTOR | @willirath this is great stuff, thanks again! So generally it looks like the graph is more efficient when doing operations of the form:
than doing
or like what I've implemented (see screenshot)? ```python3 intermediate = (X * Y) - (X.mean('time') * Y.mean('time')) intermediate.mean('time') ``` If so, it seems like the most efficient(?) way to do the computation in _cov_corr() is to combine it all into one line? I can't think of how to do this though... |
{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
Improvements to lazy behaviour of `xr.cov()` and `xr.corr()` 904153867 | |
850542572 | https://github.com/pydata/xarray/pull/5390#issuecomment-850542572 | https://api.github.com/repos/pydata/xarray/issues/5390 | MDEyOklzc3VlQ29tbWVudDg1MDU0MjU3Mg== | willirath 5700886 | 2021-05-28T16:45:55Z | 2021-05-28T16:45:55Z | CONTRIBUTOR | @AndrewWilliams3142 @dcherian Looks like I broke the first Gist. :( Your Example above does not quite get there, because the Here's a Gist that explains the idea for the correlations: https://nbviewer.jupyter.org/gist/willirath/c5c5274f31c98e8452548e8571158803 With ```python X = xr.DataArray( darr.random.normal(size=array_size, chunks=chunk_size), dims=("t", "y", "x"), name="X", ) Y = xr.DataArray(
darr.random.normal(size=array_size, chunks=chunk_size),
dims=("t", "y", "x"),
name="Y",
)
Dask won't release any of the tasks defining The "good" / aggregating way of calculting the correlation
|
{ "total_count": 1, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 1 } |
Improvements to lazy behaviour of `xr.cov()` and `xr.corr()` 904153867 | |
850535282 | https://github.com/pydata/xarray/pull/5390#issuecomment-850535282 | https://api.github.com/repos/pydata/xarray/issues/5390 | MDEyOklzc3VlQ29tbWVudDg1MDUzNTI4Mg== | dcherian 2448579 | 2021-05-28T16:31:36Z | 2021-05-28T16:31:36Z | MEMBER | @AndrewWilliams3142 I think that's right. You can confirm these ideas by profiling a test problem: https://docs.dask.org/en/latest/diagnostics-local.html#example It does seem like with the new version dask will hold on on to |
{ "total_count": 1, "+1": 1, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
Improvements to lazy behaviour of `xr.cov()` and `xr.corr()` 904153867 | |
850276619 | https://github.com/pydata/xarray/pull/5390#issuecomment-850276619 | https://api.github.com/repos/pydata/xarray/issues/5390 | MDEyOklzc3VlQ29tbWVudDg1MDI3NjYxOQ== | AndrewILWilliams 56925856 | 2021-05-28T09:15:30Z | 2021-05-28T09:17:48Z | CONTRIBUTOR | @willirath , thanks for your example notebook! I'm still trying to get my head around this a bit though. Say you have ```python3 da_a = xr.DataArray( np.array([[1, 2, 3, 4], [1, 0.1, 0.2, 0.3], [2, 3.2, 0.6, 1.8]]), dims=("space", "time"), coords=[ ("space", ["IA", "IL", "IN"]), ("time", pd.date_range("2000-01-01", freq="1D", periods=4)), ], ).chunk() da_b = xr.DataArray( np.array([[0.2, 0.4, 0.6, 2], [15, 10, 5, 1], [1, 3.2, np.nan, 1.8]]), dims=("space", "time"), coords=[ ("space", ["IA", "IL", "IN"]), ("time", pd.date_range("2000-01-01", freq="1D", periods=4)), ], ).chunk() ``` The original computation in Whereas my alteration now has a graph more like this:
Am I correct in thinking that this is a 'better' computational graph? Because the original chunks are not passed onto later points in the computation? |
{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
Improvements to lazy behaviour of `xr.cov()` and `xr.corr()` 904153867 |
Advanced export
JSON shape: default, array, newline-delimited, object
CREATE TABLE [issue_comments] ( [html_url] TEXT, [issue_url] TEXT, [id] INTEGER PRIMARY KEY, [node_id] TEXT, [user] INTEGER REFERENCES [users]([id]), [created_at] TEXT, [updated_at] TEXT, [author_association] TEXT, [body] TEXT, [reactions] TEXT, [performed_via_github_app] TEXT, [issue] INTEGER REFERENCES [issues]([id]) ); CREATE INDEX [idx_issue_comments_issue] ON [issue_comments] ([issue]); CREATE INDEX [idx_issue_comments_user] ON [issue_comments] ([user]);
user 4