home / github

Menu
  • GraphQL API
  • Search all tables

issue_comments

Table actions
  • GraphQL API for issue_comments

8 rows where issue = 188996339 and user = 6334793 sorted by updated_at descending

✎ View and edit SQL

This data as json, CSV (advanced)

Suggested facets: reactions, created_at (date), updated_at (date)

user 1

  • hrishikeshac · 8 ✖

issue 1

  • Feature request: Compute cross-correlation (similar to pd.Series.corr()) of gridded data · 8 ✖

author_association 1

  • NONE 8
id html_url issue_url node_id user created_at updated_at ▲ author_association body reactions performed_via_github_app issue
549511089 https://github.com/pydata/xarray/issues/1115#issuecomment-549511089 https://api.github.com/repos/pydata/xarray/issues/1115 MDEyOklzc3VlQ29tbWVudDU0OTUxMTA4OQ== hrishikeshac 6334793 2019-11-04T19:31:46Z 2019-11-04T19:31:46Z NONE

Guys sorry for dropping the ball on this one. I made some changes to the PR based on the feedback I got, but I couldn't figure out the tests. Would anyone like to take this over?

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Feature request: Compute cross-correlation (similar to pd.Series.corr()) of gridded data 188996339
451602947 https://github.com/pydata/xarray/issues/1115#issuecomment-451602947 https://api.github.com/repos/pydata/xarray/issues/1115 MDEyOklzc3VlQ29tbWVudDQ1MTYwMjk0Nw== hrishikeshac 6334793 2019-01-04T23:48:54Z 2019-01-04T23:48:54Z NONE

PR done! Changed np.sum() to dataarray.sum()

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Feature request: Compute cross-correlation (similar to pd.Series.corr()) of gridded data 188996339
451052107 https://github.com/pydata/xarray/issues/1115#issuecomment-451052107 https://api.github.com/repos/pydata/xarray/issues/1115 MDEyOklzc3VlQ29tbWVudDQ1MTA1MjEwNw== hrishikeshac 6334793 2019-01-03T04:10:35Z 2019-01-03T04:14:54Z NONE

Okay. Here's what I have come up with. I have tested it against two 1-d dataarrays, 2 N-D dataarrays, and one 1-D, and another N-D dataarrays, all cases having misaligned and having missing values.

Before going forward, 1. What do you think of it? Any improvements? 2. Steps 1 and 2 (broadcasting and ignoring common missing values) are identical in both cov() and corr(). Is there a better way to reduce the duplication while still retaining both functions as standalone?

``` def cov(self, other, dim = None): """Compute covariance between two DataArray objects along a shared dimension.

Parameters
----------
other: DataArray
    The other array with which the covariance will be computed
dim: The dimension along which the covariance will be computed

Returns
-------
covariance: DataArray
"""
# 1. Broadcast the two arrays
self, other     = xr.broadcast(self, other)

# 2. Ignore the nans
valid_values    = self.notnull() & other.notnull()
self            = self.where(valid_values, drop=True)
other           = other.where(valid_values, drop=True)
valid_count     = valid_values.sum(dim)

#3. Compute mean and standard deviation along the given dim
demeaned_self   = self - self.mean(dim = dim)
demeaned_other  = other - other.mean(dim = dim)

#4. Compute  covariance along the given dim
if dim:
    axis = self.get_axis_num(dim = dim)
else:
    axis = None
cov             =  np.sum(demeaned_self*demeaned_other, axis=axis)/(valid_count)

return cov

def corr(self, other, dim = None): """Compute correlation between two DataArray objects along a shared dimension.

Parameters
----------
other: DataArray
    The other array with which the correlation will be computed
dim: The dimension along which the correlation will be computed

Returns
-------
correlation: DataArray
"""
# 1. Broadcast the two arrays
self, other     = xr.broadcast(self, other)

# 2. Ignore the nans
valid_values    = self.notnull() & other.notnull()
self            = self.where(valid_values, drop=True)
other           = other.where(valid_values, drop=True)

# 3. Compute correlation based on standard deviations and cov()
self_std        = self.std(dim=dim)
other_std       = other.std(dim=dim)

return cov(self, other, dim = dim)/(self_std*other_std)

```

For testing: ``` # self: Load demo data and trim it's size ds = xr.tutorial.load_dataset('air_temperature') air = ds.air[:18,...] # other: select missaligned data, and smooth it to dampen the correlation with self. air_smooth = ds.air[2:20,...].rolling(time= 3, center=True).mean(dim='time') #. # A handy function to select an example grid def select_pts(da): return da.sel(lat=45, lon=250)

#Test #1: Misaligned 1-D dataarrays with missing values
ts1 = select_pts(air.copy())
ts2 = select_pts(air_smooth.copy())

def pd_corr(ts1,ts2):
    """Ensure the ts are aligned and missing values ignored"""
    # ts1,ts2 = xr.align(ts1,ts2)
    valid_values = ts1.notnull() & ts2.notnull()

    ts1  = ts1.where(valid_values, drop = True)
    ts2  = ts2.where(valid_values, drop = True)

    return ts1.to_series().corr(ts2.to_series())

expected = pd_corr(ts1, ts2)
actual   = corr(ts1,ts2)
np.allclose(expected, actual)

#Test #2: Misaligned N-D dataarrays with missing values
actual_ND = corr(air,air_smooth, dim = 'time')
actual = select_pts(actual_ND)
np.allclose(expected, actual)

# Test #3: One 1-D dataarray and another N-D dataarray; misaligned and having missing values
actual_ND = corr(air_smooth,ts1, dim = 'time')
actual    = select_pts(actual_ND)
np.allclose(actual, expected)

```

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Feature request: Compute cross-correlation (similar to pd.Series.corr()) of gridded data 188996339
445390271 https://github.com/pydata/xarray/issues/1115#issuecomment-445390271 https://api.github.com/repos/pydata/xarray/issues/1115 MDEyOklzc3VlQ29tbWVudDQ0NTM5MDI3MQ== hrishikeshac 6334793 2018-12-07T22:53:06Z 2018-12-07T22:53:06Z NONE

Okay. I am writing the simultaneous correlation and covariance functions on dataxarray.py instead of dataset.py- following the pd.Series.corr(self, other, dim) style.

{
    "total_count": 1,
    "+1": 1,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Feature request: Compute cross-correlation (similar to pd.Series.corr()) of gridded data 188996339
442994118 https://github.com/pydata/xarray/issues/1115#issuecomment-442994118 https://api.github.com/repos/pydata/xarray/issues/1115 MDEyOklzc3VlQ29tbWVudDQ0Mjk5NDExOA== hrishikeshac 6334793 2018-11-29T21:09:55Z 2018-11-29T21:09:55Z NONE

Sorry for the radio silence- I will work on this next week. Thanks @max-sixty for the updates, @rabernat for reaching out, will let you know if I need help.

Should we keep it simple following @max-sixty , or should I also add the functionality to handle lagged correlations?

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Feature request: Compute cross-correlation (similar to pd.Series.corr()) of gridded data 188996339
419501548 https://github.com/pydata/xarray/issues/1115#issuecomment-419501548 https://api.github.com/repos/pydata/xarray/issues/1115 MDEyOklzc3VlQ29tbWVudDQxOTUwMTU0OA== hrishikeshac 6334793 2018-09-07T16:55:13Z 2018-09-07T16:55:13Z NONE

@max-sixty thanks!

Then I will start with testing @shoyer 's suggestion and mvstats for the basic implementation.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Feature request: Compute cross-correlation (similar to pd.Series.corr()) of gridded data 188996339
418406658 https://github.com/pydata/xarray/issues/1115#issuecomment-418406658 https://api.github.com/repos/pydata/xarray/issues/1115 MDEyOklzc3VlQ29tbWVudDQxODQwNjY1OA== hrishikeshac 6334793 2018-09-04T15:15:35Z 2018-09-04T15:15:35Z NONE

Sometime back I wrote a package based on xarray regarding this. I would be happy to be involved in implementing it in xarray as well, but I am new to contributing to such a large-scale project and it looks a bit intimidating!

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Feature request: Compute cross-correlation (similar to pd.Series.corr()) of gridded data 188996339
331686038 https://github.com/pydata/xarray/issues/1115#issuecomment-331686038 https://api.github.com/repos/pydata/xarray/issues/1115 MDEyOklzc3VlQ29tbWVudDMzMTY4NjAzOA== hrishikeshac 6334793 2017-09-24T04:14:00Z 2017-09-24T04:14:00Z NONE

FYI @shoyer @fmaussion , I had to revisit the problem and ended up writing a function to compute vectorized cross-correlation, covariance, regression calculations (along with p-value and standard error) for xr.DataArrays. Essentially, I tried to mimic scipy.stats.linregress() but for multi-dimensional data, and included the ability to compute lagged relationships. Here's the function and its demonstration; please feel free to incorporate it in xarray if deemed useful: https://hrishichandanpurkar.blogspot.com/2017/09/vectorized-functions-for-correlation.html

{
    "total_count": 5,
    "+1": 4,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 1,
    "rocket": 0,
    "eyes": 0
}
  Feature request: Compute cross-correlation (similar to pd.Series.corr()) of gridded data 188996339

Advanced export

JSON shape: default, array, newline-delimited, object

CSV options:

CREATE TABLE [issue_comments] (
   [html_url] TEXT,
   [issue_url] TEXT,
   [id] INTEGER PRIMARY KEY,
   [node_id] TEXT,
   [user] INTEGER REFERENCES [users]([id]),
   [created_at] TEXT,
   [updated_at] TEXT,
   [author_association] TEXT,
   [body] TEXT,
   [reactions] TEXT,
   [performed_via_github_app] TEXT,
   [issue] INTEGER REFERENCES [issues]([id])
);
CREATE INDEX [idx_issue_comments_issue]
    ON [issue_comments] ([issue]);
CREATE INDEX [idx_issue_comments_user]
    ON [issue_comments] ([user]);
Powered by Datasette · Queries took 12.388ms · About: xarray-datasette