home / github

Menu
  • GraphQL API
  • Search all tables

issue_comments

Table actions
  • GraphQL API for issue_comments

31 rows where issue = 188996339 sorted by updated_at descending

✎ View and edit SQL

This data as json, CSV (advanced)

Suggested facets: reactions, created_at (date), updated_at (date)

user 10

  • max-sixty 10
  • hrishikeshac 8
  • rabernat 3
  • shoyer 3
  • r-beer 2
  • dcherian 1
  • patrickcgray 1
  • sebhahn 1
  • fmaussion 1
  • serazing 1

author_association 2

  • MEMBER 18
  • NONE 13

issue 1

  • Feature request: Compute cross-correlation (similar to pd.Series.corr()) of gridded data · 31 ✖
id html_url issue_url node_id user created_at updated_at ▲ author_association body reactions performed_via_github_app issue
589154086 https://github.com/pydata/xarray/issues/1115#issuecomment-589154086 https://api.github.com/repos/pydata/xarray/issues/1115 MDEyOklzc3VlQ29tbWVudDU4OTE1NDA4Ng== max-sixty 5635139 2020-02-20T16:00:56Z 2020-02-20T16:00:56Z MEMBER

@r-beer I checked back on this and realized I didn't reply to your question: yes re completing #2652, if you're up for giving this a push

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Feature request: Compute cross-correlation (similar to pd.Series.corr()) of gridded data 188996339
555726775 https://github.com/pydata/xarray/issues/1115#issuecomment-555726775 https://api.github.com/repos/pydata/xarray/issues/1115 MDEyOklzc3VlQ29tbWVudDU1NTcyNjc3NQ== r-beer 45787861 2019-11-19T21:36:42Z 2019-11-19T21:36:42Z NONE

@r-beer would be great to finish this off! I think this would be a popular feature. You could take @hrishikeshac 's code (which is close!) and make the final changes.

OK, that means to make #2652 pass, right?

I downloaded the respective branch from @hrishikeshac, and ran the tests locally.

See respective discussion in #2652.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Feature request: Compute cross-correlation (similar to pd.Series.corr()) of gridded data 188996339
555564450 https://github.com/pydata/xarray/issues/1115#issuecomment-555564450 https://api.github.com/repos/pydata/xarray/issues/1115 MDEyOklzc3VlQ29tbWVudDU1NTU2NDQ1MA== max-sixty 5635139 2019-11-19T15:39:17Z 2019-11-19T15:39:17Z MEMBER

@r-beer would be great to finish this off! I think this would be a popular feature. You could take @hrishikeshac 's code (which is close!) and make the final changes.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Feature request: Compute cross-correlation (similar to pd.Series.corr()) of gridded data 188996339
555376229 https://github.com/pydata/xarray/issues/1115#issuecomment-555376229 https://api.github.com/repos/pydata/xarray/issues/1115 MDEyOklzc3VlQ29tbWVudDU1NTM3NjIyOQ== r-beer 45787861 2019-11-19T07:44:23Z 2019-11-19T07:45:26Z NONE

I am also highly interested in this function and in contributing to xarray in general!

If I understand correctly, https://github.com/pydata/xarray/pull/2350 and https://github.com/pydata/xarray/pull/2652 do not solve this PR, do they?

How can I help you finishing these PRs?

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Feature request: Compute cross-correlation (similar to pd.Series.corr()) of gridded data 188996339
549511089 https://github.com/pydata/xarray/issues/1115#issuecomment-549511089 https://api.github.com/repos/pydata/xarray/issues/1115 MDEyOklzc3VlQ29tbWVudDU0OTUxMTA4OQ== hrishikeshac 6334793 2019-11-04T19:31:46Z 2019-11-04T19:31:46Z NONE

Guys sorry for dropping the ball on this one. I made some changes to the PR based on the feedback I got, but I couldn't figure out the tests. Would anyone like to take this over?

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Feature request: Compute cross-correlation (similar to pd.Series.corr()) of gridded data 188996339
546176175 https://github.com/pydata/xarray/issues/1115#issuecomment-546176175 https://api.github.com/repos/pydata/xarray/issues/1115 MDEyOklzc3VlQ29tbWVudDU0NjE3NjE3NQ== max-sixty 5635139 2019-10-25T02:38:40Z 2019-10-25T02:38:40Z MEMBER

Would be great to get this in, if anyone wants to have a go. A small, focused, PR would be a good start.

In the meantime you can use one of the solutions above...

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Feature request: Compute cross-correlation (similar to pd.Series.corr()) of gridded data 188996339
545986180 https://github.com/pydata/xarray/issues/1115#issuecomment-545986180 https://api.github.com/repos/pydata/xarray/issues/1115 MDEyOklzc3VlQ29tbWVudDU0NTk4NjE4MA== patrickcgray 2497349 2019-10-24T15:59:35Z 2019-10-24T15:59:35Z NONE

I see that this PR never made it through and there is a somewhat similar PR finished here: https://github.com/pydata/xarray/pull/2350 though it doesn't do exactly what was proposed in this PR. Is there a suggested approach for performing cross-correlation on multiple DataArray?

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Feature request: Compute cross-correlation (similar to pd.Series.corr()) of gridded data 188996339
451602947 https://github.com/pydata/xarray/issues/1115#issuecomment-451602947 https://api.github.com/repos/pydata/xarray/issues/1115 MDEyOklzc3VlQ29tbWVudDQ1MTYwMjk0Nw== hrishikeshac 6334793 2019-01-04T23:48:54Z 2019-01-04T23:48:54Z NONE

PR done! Changed np.sum() to dataarray.sum()

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Feature request: Compute cross-correlation (similar to pd.Series.corr()) of gridded data 188996339
451589152 https://github.com/pydata/xarray/issues/1115#issuecomment-451589152 https://api.github.com/repos/pydata/xarray/issues/1115 MDEyOklzc3VlQ29tbWVudDQ1MTU4OTE1Mg== max-sixty 5635139 2019-01-04T22:35:15Z 2019-01-04T22:35:15Z MEMBER

@hrishikeshac that looks great! Well done for getting an MVP running.

Do you want to do a PR from this? Should be v close from here.

Others can comment from there. I'd suggest we get something close to this in and iterate from there. How abstract do we want the dimensions to be (i.e. currently we can only pass one dimension in, which is fine, but potentially we could enable multiple).

One nit - no need to use np.sum - that may cause issues with dask arrays - .sum will work fine

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Feature request: Compute cross-correlation (similar to pd.Series.corr()) of gridded data 188996339
451052107 https://github.com/pydata/xarray/issues/1115#issuecomment-451052107 https://api.github.com/repos/pydata/xarray/issues/1115 MDEyOklzc3VlQ29tbWVudDQ1MTA1MjEwNw== hrishikeshac 6334793 2019-01-03T04:10:35Z 2019-01-03T04:14:54Z NONE

Okay. Here's what I have come up with. I have tested it against two 1-d dataarrays, 2 N-D dataarrays, and one 1-D, and another N-D dataarrays, all cases having misaligned and having missing values.

Before going forward, 1. What do you think of it? Any improvements? 2. Steps 1 and 2 (broadcasting and ignoring common missing values) are identical in both cov() and corr(). Is there a better way to reduce the duplication while still retaining both functions as standalone?

``` def cov(self, other, dim = None): """Compute covariance between two DataArray objects along a shared dimension.

Parameters
----------
other: DataArray
    The other array with which the covariance will be computed
dim: The dimension along which the covariance will be computed

Returns
-------
covariance: DataArray
"""
# 1. Broadcast the two arrays
self, other     = xr.broadcast(self, other)

# 2. Ignore the nans
valid_values    = self.notnull() & other.notnull()
self            = self.where(valid_values, drop=True)
other           = other.where(valid_values, drop=True)
valid_count     = valid_values.sum(dim)

#3. Compute mean and standard deviation along the given dim
demeaned_self   = self - self.mean(dim = dim)
demeaned_other  = other - other.mean(dim = dim)

#4. Compute  covariance along the given dim
if dim:
    axis = self.get_axis_num(dim = dim)
else:
    axis = None
cov             =  np.sum(demeaned_self*demeaned_other, axis=axis)/(valid_count)

return cov

def corr(self, other, dim = None): """Compute correlation between two DataArray objects along a shared dimension.

Parameters
----------
other: DataArray
    The other array with which the correlation will be computed
dim: The dimension along which the correlation will be computed

Returns
-------
correlation: DataArray
"""
# 1. Broadcast the two arrays
self, other     = xr.broadcast(self, other)

# 2. Ignore the nans
valid_values    = self.notnull() & other.notnull()
self            = self.where(valid_values, drop=True)
other           = other.where(valid_values, drop=True)

# 3. Compute correlation based on standard deviations and cov()
self_std        = self.std(dim=dim)
other_std       = other.std(dim=dim)

return cov(self, other, dim = dim)/(self_std*other_std)

```

For testing: ``` # self: Load demo data and trim it's size ds = xr.tutorial.load_dataset('air_temperature') air = ds.air[:18,...] # other: select missaligned data, and smooth it to dampen the correlation with self. air_smooth = ds.air[2:20,...].rolling(time= 3, center=True).mean(dim='time') #. # A handy function to select an example grid def select_pts(da): return da.sel(lat=45, lon=250)

#Test #1: Misaligned 1-D dataarrays with missing values
ts1 = select_pts(air.copy())
ts2 = select_pts(air_smooth.copy())

def pd_corr(ts1,ts2):
    """Ensure the ts are aligned and missing values ignored"""
    # ts1,ts2 = xr.align(ts1,ts2)
    valid_values = ts1.notnull() & ts2.notnull()

    ts1  = ts1.where(valid_values, drop = True)
    ts2  = ts2.where(valid_values, drop = True)

    return ts1.to_series().corr(ts2.to_series())

expected = pd_corr(ts1, ts2)
actual   = corr(ts1,ts2)
np.allclose(expected, actual)

#Test #2: Misaligned N-D dataarrays with missing values
actual_ND = corr(air,air_smooth, dim = 'time')
actual = select_pts(actual_ND)
np.allclose(expected, actual)

# Test #3: One 1-D dataarray and another N-D dataarray; misaligned and having missing values
actual_ND = corr(air_smooth,ts1, dim = 'time')
actual    = select_pts(actual_ND)
np.allclose(actual, expected)

```

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Feature request: Compute cross-correlation (similar to pd.Series.corr()) of gridded data 188996339
445390271 https://github.com/pydata/xarray/issues/1115#issuecomment-445390271 https://api.github.com/repos/pydata/xarray/issues/1115 MDEyOklzc3VlQ29tbWVudDQ0NTM5MDI3MQ== hrishikeshac 6334793 2018-12-07T22:53:06Z 2018-12-07T22:53:06Z NONE

Okay. I am writing the simultaneous correlation and covariance functions on dataxarray.py instead of dataset.py- following the pd.Series.corr(self, other, dim) style.

{
    "total_count": 1,
    "+1": 1,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Feature request: Compute cross-correlation (similar to pd.Series.corr()) of gridded data 188996339
445388428 https://github.com/pydata/xarray/issues/1115#issuecomment-445388428 https://api.github.com/repos/pydata/xarray/issues/1115 MDEyOklzc3VlQ29tbWVudDQ0NTM4ODQyOA== max-sixty 5635139 2018-12-07T22:42:57Z 2018-12-07T22:42:57Z MEMBER

Yes for useful, but not sure whether they should be on the same method. They're also fairly easy for a user to construct (call correlation on a .shift copy of the array).

And increments are easy to build on! I'm the worst offender, but don't let completeness get in the way of incremental improvement

(OK, I'll go and finish the fill_value branch...)

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Feature request: Compute cross-correlation (similar to pd.Series.corr()) of gridded data 188996339
445386281 https://github.com/pydata/xarray/issues/1115#issuecomment-445386281 https://api.github.com/repos/pydata/xarray/issues/1115 MDEyOklzc3VlQ29tbWVudDQ0NTM4NjI4MQ== dcherian 2448579 2018-12-07T22:32:50Z 2018-12-07T22:32:50Z MEMBER

I think lagged correlations would be a useful feature.

{
    "total_count": 1,
    "+1": 1,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Feature request: Compute cross-correlation (similar to pd.Series.corr()) of gridded data 188996339
442994118 https://github.com/pydata/xarray/issues/1115#issuecomment-442994118 https://api.github.com/repos/pydata/xarray/issues/1115 MDEyOklzc3VlQ29tbWVudDQ0Mjk5NDExOA== hrishikeshac 6334793 2018-11-29T21:09:55Z 2018-11-29T21:09:55Z NONE

Sorry for the radio silence- I will work on this next week. Thanks @max-sixty for the updates, @rabernat for reaching out, will let you know if I need help.

Should we keep it simple following @max-sixty , or should I also add the functionality to handle lagged correlations?

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Feature request: Compute cross-correlation (similar to pd.Series.corr()) of gridded data 188996339
442858136 https://github.com/pydata/xarray/issues/1115#issuecomment-442858136 https://api.github.com/repos/pydata/xarray/issues/1115 MDEyOklzc3VlQ29tbWVudDQ0Mjg1ODEzNg== rabernat 1197350 2018-11-29T14:43:13Z 2018-11-29T14:43:13Z MEMBER

Hey @hrishikeshac -- any progress on this? Need any help / advice from xarray devs?

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Feature request: Compute cross-correlation (similar to pd.Series.corr()) of gridded data 188996339
438370603 https://github.com/pydata/xarray/issues/1115#issuecomment-438370603 https://api.github.com/repos/pydata/xarray/issues/1115 MDEyOklzc3VlQ29tbWVudDQzODM3MDYwMw== max-sixty 5635139 2018-11-13T17:51:56Z 2018-11-13T17:51:56Z MEMBER

And one that handles NaNs:

```python

untested!

def covariance(x, y, dim=None): valid_values = x.notnull() & y.notnull() valid_count = valid_values.sum(dim)

demeaned_x = (x - x.mean(dim)).fillna(0)
demeaned_y = (y - y.mean(dim)).fillna(0)

return xr.dot(demeaned_x, demeaned_y, dims=dim) / valid_count

def correlation(x, y, dim=None): # dim should default to the intersection of x.dims and y.dims return covariance(x, y, dim) / (x.std(dim) * y.std(dim)) ```

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Feature request: Compute cross-correlation (similar to pd.Series.corr()) of gridded data 188996339
436784481 https://github.com/pydata/xarray/issues/1115#issuecomment-436784481 https://api.github.com/repos/pydata/xarray/issues/1115 MDEyOklzc3VlQ29tbWVudDQzNjc4NDQ4MQ== max-sixty 5635139 2018-11-07T21:31:12Z 2018-11-07T21:31:18Z MEMBER

For posterity, I made a small adjustment to @shoyer 's draft:

```python

untested!

def covariance(x, y, dim=None): # need to ensure the dim lengths are the same - i.e. no auto-aligning # could use count-1 for sample return xr.dot(x - x.mean(dim), y - y.mean(dim), dims=dim) / x.count(dim)

def correlation(x, y, dim=None): # dim should default to the intersection of x.dims and y.dims return covariance(x, y, dim) / (x.std(dim) * y.std(dim)) ```

{
    "total_count": 2,
    "+1": 2,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Feature request: Compute cross-correlation (similar to pd.Series.corr()) of gridded data 188996339
419519217 https://github.com/pydata/xarray/issues/1115#issuecomment-419519217 https://api.github.com/repos/pydata/xarray/issues/1115 MDEyOklzc3VlQ29tbWVudDQxOTUxOTIxNw== max-sixty 5635139 2018-09-07T17:59:55Z 2018-09-07T17:59:55Z MEMBER

Great! Ping me / the issues with any questions at all!

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Feature request: Compute cross-correlation (similar to pd.Series.corr()) of gridded data 188996339
419501548 https://github.com/pydata/xarray/issues/1115#issuecomment-419501548 https://api.github.com/repos/pydata/xarray/issues/1115 MDEyOklzc3VlQ29tbWVudDQxOTUwMTU0OA== hrishikeshac 6334793 2018-09-07T16:55:13Z 2018-09-07T16:55:13Z NONE

@max-sixty thanks!

Then I will start with testing @shoyer 's suggestion and mvstats for the basic implementation.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Feature request: Compute cross-correlation (similar to pd.Series.corr()) of gridded data 188996339
418530212 https://github.com/pydata/xarray/issues/1115#issuecomment-418530212 https://api.github.com/repos/pydata/xarray/issues/1115 MDEyOklzc3VlQ29tbWVudDQxODUzMDIxMg== max-sixty 5635139 2018-09-04T21:52:22Z 2018-09-04T21:52:22Z MEMBER

@hrishikeshac if you'd like to contribute, we can help you along - xarray is a v welcoming project!

And from mvstats it looks like you're already up to speed

Let us know

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Feature request: Compute cross-correlation (similar to pd.Series.corr()) of gridded data 188996339
418406658 https://github.com/pydata/xarray/issues/1115#issuecomment-418406658 https://api.github.com/repos/pydata/xarray/issues/1115 MDEyOklzc3VlQ29tbWVudDQxODQwNjY1OA== hrishikeshac 6334793 2018-09-04T15:15:35Z 2018-09-04T15:15:35Z NONE

Sometime back I wrote a package based on xarray regarding this. I would be happy to be involved in implementing it in xarray as well, but I am new to contributing to such a large-scale project and it looks a bit intimidating!

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Feature request: Compute cross-correlation (similar to pd.Series.corr()) of gridded data 188996339
417816234 https://github.com/pydata/xarray/issues/1115#issuecomment-417816234 https://api.github.com/repos/pydata/xarray/issues/1115 MDEyOklzc3VlQ29tbWVudDQxNzgxNjIzNA== shoyer 1217238 2018-08-31T23:55:06Z 2018-08-31T23:55:06Z MEMBER

I tend to view the second case as a generalization of the first case. I would also hesitate to implement the n x m array -> m x m correlation matrix version because xarray doesn't handle repeated dimensions well.

I think the basic implementation of this looks quite similar to what I wrote here for calculating the Pearson correlation as a NumPy gufunc: http://xarray.pydata.org/en/stable/dask.html#automatic-parallelization

The main difference is that we might naturally want to support summing over multiple dimensions at once via the dim argument, e.g., something like: ```python

untested!

def covariance(x, y, dim=None): return xarray.dot(x - x.mean(dim), y - y.mean(dim), dim=dim)

def corrrelation(x, y, dim=None): # dim should default to the intersection of x.dims and y.dims return covariance(x, y, dim) / (x.std(dim) * y.std(dim)) ```

If you want to achieve the equivalent of np.corr on an array with dimensions ('n', 'm') with this, you just write something like correlation(x, x.rename({'m': 'm2'}), dim='n').

{
    "total_count": 1,
    "+1": 1,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Feature request: Compute cross-correlation (similar to pd.Series.corr()) of gridded data 188996339
417802624 https://github.com/pydata/xarray/issues/1115#issuecomment-417802624 https://api.github.com/repos/pydata/xarray/issues/1115 MDEyOklzc3VlQ29tbWVudDQxNzgwMjYyNA== max-sixty 5635139 2018-08-31T22:14:19Z 2018-08-31T22:14:19Z MEMBER

I'm up for adding .corr to xarray

What do want this to look like? It's a bit different from most xarray functions, which either return the same shape or reduce one dimension. - The basic case here would take a n x m array and return an m x m correlation matrix. We could easily wrap https://docs.scipy.org/doc/numpy/reference/generated/numpy.corrcoef.html - Another case would be take two similarly sized arrays (with the option of broadcasting) and return an array with one dimension reduced. For example 200 x 10 and 200, return a 10 array. - I need to think about how those extrapolate to multiple dimensions

Should I start with the first case and then we can expand as needed?

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Feature request: Compute cross-correlation (similar to pd.Series.corr()) of gridded data 188996339
349670336 https://github.com/pydata/xarray/issues/1115#issuecomment-349670336 https://api.github.com/repos/pydata/xarray/issues/1115 MDEyOklzc3VlQ29tbWVudDM0OTY3MDMzNg== sebhahn 5929935 2017-12-06T15:17:40Z 2017-12-06T15:17:40Z NONE

@hrishikeshac I was just looking for a function doing a regression between two datasets (x, y, time), so thanks for your function! However, I'm still wondering whether there is a much faster C (or Cython) implementation doing these kind of things?

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Feature request: Compute cross-correlation (similar to pd.Series.corr()) of gridded data 188996339
331686038 https://github.com/pydata/xarray/issues/1115#issuecomment-331686038 https://api.github.com/repos/pydata/xarray/issues/1115 MDEyOklzc3VlQ29tbWVudDMzMTY4NjAzOA== hrishikeshac 6334793 2017-09-24T04:14:00Z 2017-09-24T04:14:00Z NONE

FYI @shoyer @fmaussion , I had to revisit the problem and ended up writing a function to compute vectorized cross-correlation, covariance, regression calculations (along with p-value and standard error) for xr.DataArrays. Essentially, I tried to mimic scipy.stats.linregress() but for multi-dimensional data, and included the ability to compute lagged relationships. Here's the function and its demonstration; please feel free to incorporate it in xarray if deemed useful: https://hrishichandanpurkar.blogspot.com/2017/09/vectorized-functions-for-correlation.html

{
    "total_count": 5,
    "+1": 4,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 1,
    "rocket": 0,
    "eyes": 0
}
  Feature request: Compute cross-correlation (similar to pd.Series.corr()) of gridded data 188996339
266525792 https://github.com/pydata/xarray/issues/1115#issuecomment-266525792 https://api.github.com/repos/pydata/xarray/issues/1115 MDEyOklzc3VlQ29tbWVudDI2NjUyNTc5Mg== fmaussion 10050469 2016-12-12T19:23:48Z 2016-12-12T19:25:29Z MEMBER

I'll chime in here to ask a usage question: what is the recommended way to compute correlation maps with xarray? I.e. I have a dataarray of dims (time, lat, lon) and I'd like to correlate every single grid point with a timeseries of dim (time) to get a correlation map of dim (lat, lon). My current strategy is a wonderfully unpythonic double loop over lons and lats, and I wonder if there's better way?

{
    "total_count": 1,
    "+1": 1,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Feature request: Compute cross-correlation (similar to pd.Series.corr()) of gridded data 188996339
260387059 https://github.com/pydata/xarray/issues/1115#issuecomment-260387059 https://api.github.com/repos/pydata/xarray/issues/1115 MDEyOklzc3VlQ29tbWVudDI2MDM4NzA1OQ== rabernat 1197350 2016-11-14T16:37:05Z 2016-11-14T16:37:05Z MEMBER

To be clear, I am not say that this does not belong in xarray.

I'm saying that we lack clear general guidelines for how to determine whether a particular function belongs in xarray. The criterion of a "pretty fundamental operation for working with data" is a good starting point. I would add: - used across a wide range of scientific disciplines - clear, unambiguous / uncontroversial definition - numpy implementation already exists

corr meets all of these criteria. Many others (e.g. interpolation, convolution, curve fitting) do as well. Expanding xarray beyond the numpy ufuncs opens the door to supporting these things. I'm just saying it should be conscious, deliberate decision, given the limits on developer time.

Many of these things will be pretty trivial once .apply() is here. So perhaps it's not a big deal.

{
    "total_count": 7,
    "+1": 7,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Feature request: Compute cross-correlation (similar to pd.Series.corr()) of gridded data 188996339
260382091 https://github.com/pydata/xarray/issues/1115#issuecomment-260382091 https://api.github.com/repos/pydata/xarray/issues/1115 MDEyOklzc3VlQ29tbWVudDI2MDM4MjA5MQ== shoyer 1217238 2016-11-14T16:20:14Z 2016-11-14T16:20:14Z MEMBER

That said, correlation coefficients are a pretty fundamental operation for working with data. I could see implementing a basic corr in xarray and referring to a separate signal processing package for more options in the docstring.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Feature request: Compute cross-correlation (similar to pd.Series.corr()) of gridded data 188996339
260379241 https://github.com/pydata/xarray/issues/1115#issuecomment-260379241 https://api.github.com/repos/pydata/xarray/issues/1115 MDEyOklzc3VlQ29tbWVudDI2MDM3OTI0MQ== serazing 19403647 2016-11-14T16:10:55Z 2016-11-14T16:10:55Z NONE

I agree with @rabernat in the sense that it could be part of another package (e.g., signal processing). This would also allow the computation of statistical test to assess the significance of the correlation (which is useful since correlation may often be misinterpreted without statistical tests).

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Feature request: Compute cross-correlation (similar to pd.Series.corr()) of gridded data 188996339
260333267 https://github.com/pydata/xarray/issues/1115#issuecomment-260333267 https://api.github.com/repos/pydata/xarray/issues/1115 MDEyOklzc3VlQ29tbWVudDI2MDMzMzI2Nw== rabernat 1197350 2016-11-14T13:24:10Z 2016-11-14T13:24:10Z MEMBER

I agree this would be very useful. But it is also feature creep. There is an extremely wide range of such functions that could hypothetically be put into the xarray package. (all of scipy.signal for example) At some point the community should decide what is the intended scope of xarray itself vs. packages built on top of xarray.

{
    "total_count": 1,
    "+1": 1,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Feature request: Compute cross-correlation (similar to pd.Series.corr()) of gridded data 188996339
260219462 https://github.com/pydata/xarray/issues/1115#issuecomment-260219462 https://api.github.com/repos/pydata/xarray/issues/1115 MDEyOklzc3VlQ29tbWVudDI2MDIxOTQ2Mg== shoyer 1217238 2016-11-13T22:57:02Z 2016-11-13T22:57:02Z MEMBER

The first step here is to find a library that implements the desired functionality on pure NumPy arrays, ideally in a vectorized fashion. Then it should be pretty straightforward to wrap in xarray.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Feature request: Compute cross-correlation (similar to pd.Series.corr()) of gridded data 188996339

Advanced export

JSON shape: default, array, newline-delimited, object

CSV options:

CREATE TABLE [issue_comments] (
   [html_url] TEXT,
   [issue_url] TEXT,
   [id] INTEGER PRIMARY KEY,
   [node_id] TEXT,
   [user] INTEGER REFERENCES [users]([id]),
   [created_at] TEXT,
   [updated_at] TEXT,
   [author_association] TEXT,
   [body] TEXT,
   [reactions] TEXT,
   [performed_via_github_app] TEXT,
   [issue] INTEGER REFERENCES [issues]([id])
);
CREATE INDEX [idx_issue_comments_issue]
    ON [issue_comments] ([issue]);
CREATE INDEX [idx_issue_comments_user]
    ON [issue_comments] ([user]);
Powered by Datasette · Queries took 16.419ms · About: xarray-datasette