home / github / issues

Menu
  • Search all tables
  • GraphQL API

issues: 681325776

This data as json

id node_id number title user state locked assignee milestone comments created_at updated_at closed_at author_association active_lock_reason draft pull_request body reactions performed_via_github_app state_reason repo type
681325776 MDU6SXNzdWU2ODEzMjU3NzY= 4349 NaN in cov & cov? 5635139 closed 0     1 2020-08-18T20:36:40Z 2020-08-30T11:36:57Z 2020-08-30T11:36:57Z MEMBER      

Is your feature request related to a problem? Please describe. Could cov & corr ignore missing values?

Describe the solution you'd like Currently any NaN in an dimension over which cov / corr is calculated gives a NaN result:

```python In [1]: import xarray as xr ...: import numpy as np ...: da = xr.DataArray([[1, 2], [1, np.nan]], dims=["x", "time"]) ...: da Out[1]: <xarray.DataArray (x: 2, time: 2)> array([[ 1., 2.], [ 1., nan]]) Dimensions without coordinates: x, time

In [2]: xr.cov(da,da) Out[2]: <xarray.DataArray ()> array(nan) ```

That's explained here as: python # 4. Compute covariance along the given dim # N.B. `skipna=False` is required or there is a bug when computing # auto-covariance. E.g. Try xr.cov(da,da) for # da = xr.DataArray([[1, 2], [1, np.nan]], dims=["x", "time"]) cov = (demeaned_da_a * demeaned_da_b).sum(dim=dim, skipna=False) / (valid_count)

Without having thought about it for too long, I'm not sure I understand this, and couldn't find any discussion in the PR. Adding this diff seems to fail tests around NaN values but no others:

``diff diff --git a/xarray/core/computation.py b/xarray/core/computation.py index 1f2a8a8e..1fc95fe1 100644 --- a/xarray/core/computation.py +++ b/xarray/core/computation.py @@ -1256,7 +1256,8 @@ def _cov_corr(da_a, da_b, dim=None, ddof=0, method=None): # N.B.skipna=False` is required or there is a bug when computing # auto-covariance. E.g. Try xr.cov(da,da) for # da = xr.DataArray([[1, 2], [1, np.nan]], dims=["x", "time"]) - cov = (demeaned_da_a * demeaned_da_b).sum(dim=dim, skipna=False) / (valid_count) + cov = (demeaned_da_a * demeaned_da_b).sum(dim=dim, skipna=True, min_count=1) / (valid_count) + # cov = (demeaned_da_a * demeaned_da_b).sum(dim=dim, skipna=False) / (valid_count)

 if method == "cov":
     return cov

```

Does anyone know off-hand the logic here?

{
    "url": "https://api.github.com/repos/pydata/xarray/issues/4349/reactions",
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  completed 13221727 issue

Links from other tables

  • 0 rows from issues_id in issues_labels
  • 1 row from issue in issue_comments
Powered by Datasette · Queries took 0.625ms · About: xarray-datasette