home / github

Menu
  • Search all tables
  • GraphQL API

issue_comments

Table actions
  • GraphQL API for issue_comments

11 rows where author_association = "NONE" and user = 6334793 sorted by updated_at descending

✎ View and edit SQL

This data as json, CSV (advanced)

Suggested facets: issue_url, reactions, created_at (date), updated_at (date)

issue 3

  • Feature request: Compute cross-correlation (similar to pd.Series.corr()) of gridded data 8
  • cov() and corr() 2
  • Removing inter-subplot spaces when using cartopy projections 1

user 1

  • hrishikeshac · 11 ✖

author_association 1

  • NONE · 11 ✖
id html_url issue_url node_id user created_at updated_at ▲ author_association body reactions performed_via_github_app issue
549511089 https://github.com/pydata/xarray/issues/1115#issuecomment-549511089 https://api.github.com/repos/pydata/xarray/issues/1115 MDEyOklzc3VlQ29tbWVudDU0OTUxMTA4OQ== hrishikeshac 6334793 2019-11-04T19:31:46Z 2019-11-04T19:31:46Z NONE

Guys sorry for dropping the ball on this one. I made some changes to the PR based on the feedback I got, but I couldn't figure out the tests. Would anyone like to take this over?

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Feature request: Compute cross-correlation (similar to pd.Series.corr()) of gridded data 188996339
452782113 https://github.com/pydata/xarray/pull/2652#issuecomment-452782113 https://api.github.com/repos/pydata/xarray/issues/2652 MDEyOklzc3VlQ29tbWVudDQ1Mjc4MjExMw== hrishikeshac 6334793 2019-01-09T17:32:12Z 2019-01-09T17:32:12Z NONE

I also think making this a function is probably a good idea, even though it's different from pandas.

One question: how should these functions align their arguments? Recall that xarray does an inner join for arithmetic (though there's an option to control this), and an outer join in most other cases. It's not entirely obvious to me what the right choice is here (or if it really even matters).

I always assumed an inner join is the way to go. I had initially just implemented align, but later changed to broadcast since the align doesn't add dimension/ labels (if missing in one of the inputs) to the output, but broadcast does. Without this, the where(valid_values) doesn't work if one input is 1-D and the other is N-D.

{
    "total_count": 1,
    "+1": 1,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  cov() and corr() 396102183
451602947 https://github.com/pydata/xarray/issues/1115#issuecomment-451602947 https://api.github.com/repos/pydata/xarray/issues/1115 MDEyOklzc3VlQ29tbWVudDQ1MTYwMjk0Nw== hrishikeshac 6334793 2019-01-04T23:48:54Z 2019-01-04T23:48:54Z NONE

PR done! Changed np.sum() to dataarray.sum()

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Feature request: Compute cross-correlation (similar to pd.Series.corr()) of gridded data 188996339
451602256 https://github.com/pydata/xarray/pull/2652#issuecomment-451602256 https://api.github.com/repos/pydata/xarray/issues/2652 MDEyOklzc3VlQ29tbWVudDQ1MTYwMjI1Ng== hrishikeshac 6334793 2019-01-04T23:44:10Z 2019-01-04T23:44:10Z NONE

Made the code PEP8 compatible. Apologies for not doing so earlier.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  cov() and corr() 396102183
451052107 https://github.com/pydata/xarray/issues/1115#issuecomment-451052107 https://api.github.com/repos/pydata/xarray/issues/1115 MDEyOklzc3VlQ29tbWVudDQ1MTA1MjEwNw== hrishikeshac 6334793 2019-01-03T04:10:35Z 2019-01-03T04:14:54Z NONE

Okay. Here's what I have come up with. I have tested it against two 1-d dataarrays, 2 N-D dataarrays, and one 1-D, and another N-D dataarrays, all cases having misaligned and having missing values.

Before going forward, 1. What do you think of it? Any improvements? 2. Steps 1 and 2 (broadcasting and ignoring common missing values) are identical in both cov() and corr(). Is there a better way to reduce the duplication while still retaining both functions as standalone?

``` def cov(self, other, dim = None): """Compute covariance between two DataArray objects along a shared dimension.

Parameters
----------
other: DataArray
    The other array with which the covariance will be computed
dim: The dimension along which the covariance will be computed

Returns
-------
covariance: DataArray
"""
# 1. Broadcast the two arrays
self, other     = xr.broadcast(self, other)

# 2. Ignore the nans
valid_values    = self.notnull() & other.notnull()
self            = self.where(valid_values, drop=True)
other           = other.where(valid_values, drop=True)
valid_count     = valid_values.sum(dim)

#3. Compute mean and standard deviation along the given dim
demeaned_self   = self - self.mean(dim = dim)
demeaned_other  = other - other.mean(dim = dim)

#4. Compute  covariance along the given dim
if dim:
    axis = self.get_axis_num(dim = dim)
else:
    axis = None
cov             =  np.sum(demeaned_self*demeaned_other, axis=axis)/(valid_count)

return cov

def corr(self, other, dim = None): """Compute correlation between two DataArray objects along a shared dimension.

Parameters
----------
other: DataArray
    The other array with which the correlation will be computed
dim: The dimension along which the correlation will be computed

Returns
-------
correlation: DataArray
"""
# 1. Broadcast the two arrays
self, other     = xr.broadcast(self, other)

# 2. Ignore the nans
valid_values    = self.notnull() & other.notnull()
self            = self.where(valid_values, drop=True)
other           = other.where(valid_values, drop=True)

# 3. Compute correlation based on standard deviations and cov()
self_std        = self.std(dim=dim)
other_std       = other.std(dim=dim)

return cov(self, other, dim = dim)/(self_std*other_std)

```

For testing: ``` # self: Load demo data and trim it's size ds = xr.tutorial.load_dataset('air_temperature') air = ds.air[:18,...] # other: select missaligned data, and smooth it to dampen the correlation with self. air_smooth = ds.air[2:20,...].rolling(time= 3, center=True).mean(dim='time') #. # A handy function to select an example grid def select_pts(da): return da.sel(lat=45, lon=250)

#Test #1: Misaligned 1-D dataarrays with missing values
ts1 = select_pts(air.copy())
ts2 = select_pts(air_smooth.copy())

def pd_corr(ts1,ts2):
    """Ensure the ts are aligned and missing values ignored"""
    # ts1,ts2 = xr.align(ts1,ts2)
    valid_values = ts1.notnull() & ts2.notnull()

    ts1  = ts1.where(valid_values, drop = True)
    ts2  = ts2.where(valid_values, drop = True)

    return ts1.to_series().corr(ts2.to_series())

expected = pd_corr(ts1, ts2)
actual   = corr(ts1,ts2)
np.allclose(expected, actual)

#Test #2: Misaligned N-D dataarrays with missing values
actual_ND = corr(air,air_smooth, dim = 'time')
actual = select_pts(actual_ND)
np.allclose(expected, actual)

# Test #3: One 1-D dataarray and another N-D dataarray; misaligned and having missing values
actual_ND = corr(air_smooth,ts1, dim = 'time')
actual    = select_pts(actual_ND)
np.allclose(actual, expected)

```

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Feature request: Compute cross-correlation (similar to pd.Series.corr()) of gridded data 188996339
445390271 https://github.com/pydata/xarray/issues/1115#issuecomment-445390271 https://api.github.com/repos/pydata/xarray/issues/1115 MDEyOklzc3VlQ29tbWVudDQ0NTM5MDI3MQ== hrishikeshac 6334793 2018-12-07T22:53:06Z 2018-12-07T22:53:06Z NONE

Okay. I am writing the simultaneous correlation and covariance functions on dataxarray.py instead of dataset.py- following the pd.Series.corr(self, other, dim) style.

{
    "total_count": 1,
    "+1": 1,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Feature request: Compute cross-correlation (similar to pd.Series.corr()) of gridded data 188996339
442994118 https://github.com/pydata/xarray/issues/1115#issuecomment-442994118 https://api.github.com/repos/pydata/xarray/issues/1115 MDEyOklzc3VlQ29tbWVudDQ0Mjk5NDExOA== hrishikeshac 6334793 2018-11-29T21:09:55Z 2018-11-29T21:09:55Z NONE

Sorry for the radio silence- I will work on this next week. Thanks @max-sixty for the updates, @rabernat for reaching out, will let you know if I need help.

Should we keep it simple following @max-sixty , or should I also add the functionality to handle lagged correlations?

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Feature request: Compute cross-correlation (similar to pd.Series.corr()) of gridded data 188996339
419501548 https://github.com/pydata/xarray/issues/1115#issuecomment-419501548 https://api.github.com/repos/pydata/xarray/issues/1115 MDEyOklzc3VlQ29tbWVudDQxOTUwMTU0OA== hrishikeshac 6334793 2018-09-07T16:55:13Z 2018-09-07T16:55:13Z NONE

@max-sixty thanks!

Then I will start with testing @shoyer 's suggestion and mvstats for the basic implementation.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Feature request: Compute cross-correlation (similar to pd.Series.corr()) of gridded data 188996339
418406658 https://github.com/pydata/xarray/issues/1115#issuecomment-418406658 https://api.github.com/repos/pydata/xarray/issues/1115 MDEyOklzc3VlQ29tbWVudDQxODQwNjY1OA== hrishikeshac 6334793 2018-09-04T15:15:35Z 2018-09-04T15:15:35Z NONE

Sometime back I wrote a package based on xarray regarding this. I would be happy to be involved in implementing it in xarray as well, but I am new to contributing to such a large-scale project and it looks a bit intimidating!

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Feature request: Compute cross-correlation (similar to pd.Series.corr()) of gridded data 188996339
375726695 https://github.com/pydata/xarray/issues/2009#issuecomment-375726695 https://api.github.com/repos/pydata/xarray/issues/2009 MDEyOklzc3VlQ29tbWVudDM3NTcyNjY5NQ== hrishikeshac 6334793 2018-03-23T16:40:06Z 2018-03-23T16:43:45Z NONE

@mathause Thanks! Your solution worked brilliantly when used with contourf(). Here's how the code looks after implementing it.

```python import cartopy.crs as ccrs import xarray as xr import matplotlib.pyplot as plt import matplotlib.gridspec as gridspec

projection = False

ts = xr.tutorial.load_dataset('air_temperature').air[0, ...] ncols, nrows = 2, 2 fig = plt.figure()

ny,nx = ts.shape

dx, dy = nx/ny, 1

figsize = plt.figaspect(float(dy * ncols) / float(dx * nrows))

fig = plt.figure(figsize=figsize)

gs = gridspec.GridSpec(ncols, nrows,wspace=0, hspace=0)

def set_map_layout(axes, width=17.0): """ set figure height, given width

Needs to be called after all plotting is done.
Source: @mathause https://github.com/mathause/mplotutils/blob/v0.1.0/mplotutils/mpl_utils.py#L47-L100
Parameters
----------
axes : ndarray of (Geo)Axes
    Array with all axes of the figure.
width : float
    Width of the full figure in cm. Default 17

..note: currently only works if all the axes have the same aspect
ratio.
"""

if isinstance(axes, plt.Axes):
    ax = axes
else:
    # assumes the first of the axes is representative for all
    ax = axes.flat[0]

# read figure data
f = ax.get_figure()

bottom = f.subplotpars.bottom
top = f.subplotpars.top
left = f.subplotpars.left
right = f.subplotpars.right
hspace = f.subplotpars.hspace
wspace = f.subplotpars.wspace

# data ratio is the aspect
aspect = ax.get_data_ratio()
# get geometry tells how many subplots there are
nrow, ncol, __ = ax.get_geometry()

# width of one plot, taking into account
# left * wf, (1-right) * wf, ncol * wp, (1-ncol) * wp * wspace
wp = (width - width * (left + (1-right))) / (ncol + (ncol-1) * wspace)

# height of one plot
hp = wp * aspect

# height of figure
height = (hp * (nrow + ((nrow - 1) * hspace))) / (1. - (bottom + (1 - top)))

f.set_figwidth(width / 2.54)
f.set_figheight(height / 2.54)

for i in range(4): if projection: ax = plt.subplot(gs[i], projection=ccrs.PlateCarree()) ax.coastlines() ts.plot.contourf(ax=ax, add_colorbar=False, add_labels=False, levels=11, transform=ccrs.PlateCarree()) else: ax = plt.subplot(gs[i]) ts.plot.contourf(ax=ax, add_colorbar=False, levels=11, add_labels=False)

ax.set_aspect('auto', adjustable='box-forced')
if (i == 0) or (i == 1):
    ax.set_title('title')
if (i == 0) or (i == 2):
    ax.set_ylabel('ylabel')
ax.set_xticks([])
ax.set_yticks([])
set_map_layout(ax)# plt.tight_layout()

fig.subplots_adjust()

``` without projection:

With projection:

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Removing inter-subplot spaces when using cartopy projections 307903558
331686038 https://github.com/pydata/xarray/issues/1115#issuecomment-331686038 https://api.github.com/repos/pydata/xarray/issues/1115 MDEyOklzc3VlQ29tbWVudDMzMTY4NjAzOA== hrishikeshac 6334793 2017-09-24T04:14:00Z 2017-09-24T04:14:00Z NONE

FYI @shoyer @fmaussion , I had to revisit the problem and ended up writing a function to compute vectorized cross-correlation, covariance, regression calculations (along with p-value and standard error) for xr.DataArrays. Essentially, I tried to mimic scipy.stats.linregress() but for multi-dimensional data, and included the ability to compute lagged relationships. Here's the function and its demonstration; please feel free to incorporate it in xarray if deemed useful: https://hrishichandanpurkar.blogspot.com/2017/09/vectorized-functions-for-correlation.html

{
    "total_count": 5,
    "+1": 4,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 1,
    "rocket": 0,
    "eyes": 0
}
  Feature request: Compute cross-correlation (similar to pd.Series.corr()) of gridded data 188996339

Advanced export

JSON shape: default, array, newline-delimited, object

CSV options:

CREATE TABLE [issue_comments] (
   [html_url] TEXT,
   [issue_url] TEXT,
   [id] INTEGER PRIMARY KEY,
   [node_id] TEXT,
   [user] INTEGER REFERENCES [users]([id]),
   [created_at] TEXT,
   [updated_at] TEXT,
   [author_association] TEXT,
   [body] TEXT,
   [reactions] TEXT,
   [performed_via_github_app] TEXT,
   [issue] INTEGER REFERENCES [issues]([id])
);
CREATE INDEX [idx_issue_comments_issue]
    ON [issue_comments] ([issue]);
CREATE INDEX [idx_issue_comments_user]
    ON [issue_comments] ([user]);
Powered by Datasette · Queries took 402.678ms · About: xarray-datasette