home / github / issue_comments

Menu
  • Search all tables
  • GraphQL API

issue_comments: 417816234

This data as json

html_url issue_url id node_id user created_at updated_at author_association body reactions performed_via_github_app issue
https://github.com/pydata/xarray/issues/1115#issuecomment-417816234 https://api.github.com/repos/pydata/xarray/issues/1115 417816234 MDEyOklzc3VlQ29tbWVudDQxNzgxNjIzNA== 1217238 2018-08-31T23:55:06Z 2018-08-31T23:55:06Z MEMBER

I tend to view the second case as a generalization of the first case. I would also hesitate to implement the n x m array -> m x m correlation matrix version because xarray doesn't handle repeated dimensions well.

I think the basic implementation of this looks quite similar to what I wrote here for calculating the Pearson correlation as a NumPy gufunc: http://xarray.pydata.org/en/stable/dask.html#automatic-parallelization

The main difference is that we might naturally want to support summing over multiple dimensions at once via the dim argument, e.g., something like: ```python

untested!

def covariance(x, y, dim=None): return xarray.dot(x - x.mean(dim), y - y.mean(dim), dim=dim)

def corrrelation(x, y, dim=None): # dim should default to the intersection of x.dims and y.dims return covariance(x, y, dim) / (x.std(dim) * y.std(dim)) ```

If you want to achieve the equivalent of np.corr on an array with dimensions ('n', 'm') with this, you just write something like correlation(x, x.rename({'m': 'm2'}), dim='n').

{
    "total_count": 1,
    "+1": 1,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  188996339
Powered by Datasette · Queries took 158.778ms · About: xarray-datasette