home / github

Menu
  • Search all tables
  • GraphQL API

issue_comments

Table actions
  • GraphQL API for issue_comments

13 rows where author_association = "NONE" and issue = 188996339 sorted by updated_at descending

✎ View and edit SQL

This data as json, CSV (advanced)

Suggested facets: reactions, created_at (date), updated_at (date)

user 5

  • hrishikeshac 8
  • r-beer 2
  • patrickcgray 1
  • sebhahn 1
  • serazing 1

issue 1

  • Feature request: Compute cross-correlation (similar to pd.Series.corr()) of gridded data · 13 ✖

author_association 1

  • NONE · 13 ✖
id html_url issue_url node_id user created_at updated_at ▲ author_association body reactions performed_via_github_app issue
555726775 https://github.com/pydata/xarray/issues/1115#issuecomment-555726775 https://api.github.com/repos/pydata/xarray/issues/1115 MDEyOklzc3VlQ29tbWVudDU1NTcyNjc3NQ== r-beer 45787861 2019-11-19T21:36:42Z 2019-11-19T21:36:42Z NONE

@r-beer would be great to finish this off! I think this would be a popular feature. You could take @hrishikeshac 's code (which is close!) and make the final changes.

OK, that means to make #2652 pass, right?

I downloaded the respective branch from @hrishikeshac, and ran the tests locally.

See respective discussion in #2652.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Feature request: Compute cross-correlation (similar to pd.Series.corr()) of gridded data 188996339
555376229 https://github.com/pydata/xarray/issues/1115#issuecomment-555376229 https://api.github.com/repos/pydata/xarray/issues/1115 MDEyOklzc3VlQ29tbWVudDU1NTM3NjIyOQ== r-beer 45787861 2019-11-19T07:44:23Z 2019-11-19T07:45:26Z NONE

I am also highly interested in this function and in contributing to xarray in general!

If I understand correctly, https://github.com/pydata/xarray/pull/2350 and https://github.com/pydata/xarray/pull/2652 do not solve this PR, do they?

How can I help you finishing these PRs?

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Feature request: Compute cross-correlation (similar to pd.Series.corr()) of gridded data 188996339
549511089 https://github.com/pydata/xarray/issues/1115#issuecomment-549511089 https://api.github.com/repos/pydata/xarray/issues/1115 MDEyOklzc3VlQ29tbWVudDU0OTUxMTA4OQ== hrishikeshac 6334793 2019-11-04T19:31:46Z 2019-11-04T19:31:46Z NONE

Guys sorry for dropping the ball on this one. I made some changes to the PR based on the feedback I got, but I couldn't figure out the tests. Would anyone like to take this over?

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Feature request: Compute cross-correlation (similar to pd.Series.corr()) of gridded data 188996339
545986180 https://github.com/pydata/xarray/issues/1115#issuecomment-545986180 https://api.github.com/repos/pydata/xarray/issues/1115 MDEyOklzc3VlQ29tbWVudDU0NTk4NjE4MA== patrickcgray 2497349 2019-10-24T15:59:35Z 2019-10-24T15:59:35Z NONE

I see that this PR never made it through and there is a somewhat similar PR finished here: https://github.com/pydata/xarray/pull/2350 though it doesn't do exactly what was proposed in this PR. Is there a suggested approach for performing cross-correlation on multiple DataArray?

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Feature request: Compute cross-correlation (similar to pd.Series.corr()) of gridded data 188996339
451602947 https://github.com/pydata/xarray/issues/1115#issuecomment-451602947 https://api.github.com/repos/pydata/xarray/issues/1115 MDEyOklzc3VlQ29tbWVudDQ1MTYwMjk0Nw== hrishikeshac 6334793 2019-01-04T23:48:54Z 2019-01-04T23:48:54Z NONE

PR done! Changed np.sum() to dataarray.sum()

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Feature request: Compute cross-correlation (similar to pd.Series.corr()) of gridded data 188996339
451052107 https://github.com/pydata/xarray/issues/1115#issuecomment-451052107 https://api.github.com/repos/pydata/xarray/issues/1115 MDEyOklzc3VlQ29tbWVudDQ1MTA1MjEwNw== hrishikeshac 6334793 2019-01-03T04:10:35Z 2019-01-03T04:14:54Z NONE

Okay. Here's what I have come up with. I have tested it against two 1-d dataarrays, 2 N-D dataarrays, and one 1-D, and another N-D dataarrays, all cases having misaligned and having missing values.

Before going forward, 1. What do you think of it? Any improvements? 2. Steps 1 and 2 (broadcasting and ignoring common missing values) are identical in both cov() and corr(). Is there a better way to reduce the duplication while still retaining both functions as standalone?

``` def cov(self, other, dim = None): """Compute covariance between two DataArray objects along a shared dimension.

Parameters
----------
other: DataArray
    The other array with which the covariance will be computed
dim: The dimension along which the covariance will be computed

Returns
-------
covariance: DataArray
"""
# 1. Broadcast the two arrays
self, other     = xr.broadcast(self, other)

# 2. Ignore the nans
valid_values    = self.notnull() & other.notnull()
self            = self.where(valid_values, drop=True)
other           = other.where(valid_values, drop=True)
valid_count     = valid_values.sum(dim)

#3. Compute mean and standard deviation along the given dim
demeaned_self   = self - self.mean(dim = dim)
demeaned_other  = other - other.mean(dim = dim)

#4. Compute  covariance along the given dim
if dim:
    axis = self.get_axis_num(dim = dim)
else:
    axis = None
cov             =  np.sum(demeaned_self*demeaned_other, axis=axis)/(valid_count)

return cov

def corr(self, other, dim = None): """Compute correlation between two DataArray objects along a shared dimension.

Parameters
----------
other: DataArray
    The other array with which the correlation will be computed
dim: The dimension along which the correlation will be computed

Returns
-------
correlation: DataArray
"""
# 1. Broadcast the two arrays
self, other     = xr.broadcast(self, other)

# 2. Ignore the nans
valid_values    = self.notnull() & other.notnull()
self            = self.where(valid_values, drop=True)
other           = other.where(valid_values, drop=True)

# 3. Compute correlation based on standard deviations and cov()
self_std        = self.std(dim=dim)
other_std       = other.std(dim=dim)

return cov(self, other, dim = dim)/(self_std*other_std)

```

For testing: ``` # self: Load demo data and trim it's size ds = xr.tutorial.load_dataset('air_temperature') air = ds.air[:18,...] # other: select missaligned data, and smooth it to dampen the correlation with self. air_smooth = ds.air[2:20,...].rolling(time= 3, center=True).mean(dim='time') #. # A handy function to select an example grid def select_pts(da): return da.sel(lat=45, lon=250)

#Test #1: Misaligned 1-D dataarrays with missing values
ts1 = select_pts(air.copy())
ts2 = select_pts(air_smooth.copy())

def pd_corr(ts1,ts2):
    """Ensure the ts are aligned and missing values ignored"""
    # ts1,ts2 = xr.align(ts1,ts2)
    valid_values = ts1.notnull() & ts2.notnull()

    ts1  = ts1.where(valid_values, drop = True)
    ts2  = ts2.where(valid_values, drop = True)

    return ts1.to_series().corr(ts2.to_series())

expected = pd_corr(ts1, ts2)
actual   = corr(ts1,ts2)
np.allclose(expected, actual)

#Test #2: Misaligned N-D dataarrays with missing values
actual_ND = corr(air,air_smooth, dim = 'time')
actual = select_pts(actual_ND)
np.allclose(expected, actual)

# Test #3: One 1-D dataarray and another N-D dataarray; misaligned and having missing values
actual_ND = corr(air_smooth,ts1, dim = 'time')
actual    = select_pts(actual_ND)
np.allclose(actual, expected)

```

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Feature request: Compute cross-correlation (similar to pd.Series.corr()) of gridded data 188996339
445390271 https://github.com/pydata/xarray/issues/1115#issuecomment-445390271 https://api.github.com/repos/pydata/xarray/issues/1115 MDEyOklzc3VlQ29tbWVudDQ0NTM5MDI3MQ== hrishikeshac 6334793 2018-12-07T22:53:06Z 2018-12-07T22:53:06Z NONE

Okay. I am writing the simultaneous correlation and covariance functions on dataxarray.py instead of dataset.py- following the pd.Series.corr(self, other, dim) style.

{
    "total_count": 1,
    "+1": 1,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Feature request: Compute cross-correlation (similar to pd.Series.corr()) of gridded data 188996339
442994118 https://github.com/pydata/xarray/issues/1115#issuecomment-442994118 https://api.github.com/repos/pydata/xarray/issues/1115 MDEyOklzc3VlQ29tbWVudDQ0Mjk5NDExOA== hrishikeshac 6334793 2018-11-29T21:09:55Z 2018-11-29T21:09:55Z NONE

Sorry for the radio silence- I will work on this next week. Thanks @max-sixty for the updates, @rabernat for reaching out, will let you know if I need help.

Should we keep it simple following @max-sixty , or should I also add the functionality to handle lagged correlations?

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Feature request: Compute cross-correlation (similar to pd.Series.corr()) of gridded data 188996339
419501548 https://github.com/pydata/xarray/issues/1115#issuecomment-419501548 https://api.github.com/repos/pydata/xarray/issues/1115 MDEyOklzc3VlQ29tbWVudDQxOTUwMTU0OA== hrishikeshac 6334793 2018-09-07T16:55:13Z 2018-09-07T16:55:13Z NONE

@max-sixty thanks!

Then I will start with testing @shoyer 's suggestion and mvstats for the basic implementation.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Feature request: Compute cross-correlation (similar to pd.Series.corr()) of gridded data 188996339
418406658 https://github.com/pydata/xarray/issues/1115#issuecomment-418406658 https://api.github.com/repos/pydata/xarray/issues/1115 MDEyOklzc3VlQ29tbWVudDQxODQwNjY1OA== hrishikeshac 6334793 2018-09-04T15:15:35Z 2018-09-04T15:15:35Z NONE

Sometime back I wrote a package based on xarray regarding this. I would be happy to be involved in implementing it in xarray as well, but I am new to contributing to such a large-scale project and it looks a bit intimidating!

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Feature request: Compute cross-correlation (similar to pd.Series.corr()) of gridded data 188996339
349670336 https://github.com/pydata/xarray/issues/1115#issuecomment-349670336 https://api.github.com/repos/pydata/xarray/issues/1115 MDEyOklzc3VlQ29tbWVudDM0OTY3MDMzNg== sebhahn 5929935 2017-12-06T15:17:40Z 2017-12-06T15:17:40Z NONE

@hrishikeshac I was just looking for a function doing a regression between two datasets (x, y, time), so thanks for your function! However, I'm still wondering whether there is a much faster C (or Cython) implementation doing these kind of things?

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Feature request: Compute cross-correlation (similar to pd.Series.corr()) of gridded data 188996339
331686038 https://github.com/pydata/xarray/issues/1115#issuecomment-331686038 https://api.github.com/repos/pydata/xarray/issues/1115 MDEyOklzc3VlQ29tbWVudDMzMTY4NjAzOA== hrishikeshac 6334793 2017-09-24T04:14:00Z 2017-09-24T04:14:00Z NONE

FYI @shoyer @fmaussion , I had to revisit the problem and ended up writing a function to compute vectorized cross-correlation, covariance, regression calculations (along with p-value and standard error) for xr.DataArrays. Essentially, I tried to mimic scipy.stats.linregress() but for multi-dimensional data, and included the ability to compute lagged relationships. Here's the function and its demonstration; please feel free to incorporate it in xarray if deemed useful: https://hrishichandanpurkar.blogspot.com/2017/09/vectorized-functions-for-correlation.html

{
    "total_count": 5,
    "+1": 4,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 1,
    "rocket": 0,
    "eyes": 0
}
  Feature request: Compute cross-correlation (similar to pd.Series.corr()) of gridded data 188996339
260379241 https://github.com/pydata/xarray/issues/1115#issuecomment-260379241 https://api.github.com/repos/pydata/xarray/issues/1115 MDEyOklzc3VlQ29tbWVudDI2MDM3OTI0MQ== serazing 19403647 2016-11-14T16:10:55Z 2016-11-14T16:10:55Z NONE

I agree with @rabernat in the sense that it could be part of another package (e.g., signal processing). This would also allow the computation of statistical test to assess the significance of the correlation (which is useful since correlation may often be misinterpreted without statistical tests).

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Feature request: Compute cross-correlation (similar to pd.Series.corr()) of gridded data 188996339

Advanced export

JSON shape: default, array, newline-delimited, object

CSV options:

CREATE TABLE [issue_comments] (
   [html_url] TEXT,
   [issue_url] TEXT,
   [id] INTEGER PRIMARY KEY,
   [node_id] TEXT,
   [user] INTEGER REFERENCES [users]([id]),
   [created_at] TEXT,
   [updated_at] TEXT,
   [author_association] TEXT,
   [body] TEXT,
   [reactions] TEXT,
   [performed_via_github_app] TEXT,
   [issue] INTEGER REFERENCES [issues]([id])
);
CREATE INDEX [idx_issue_comments_issue]
    ON [issue_comments] ([issue]);
CREATE INDEX [idx_issue_comments_user]
    ON [issue_comments] ([user]);
Powered by Datasette · Queries took 13.741ms · About: xarray-datasette