home / github

Menu
  • GraphQL API
  • Search all tables

issue_comments

Table actions
  • GraphQL API for issue_comments

20 rows where issue = 623751213 sorted by updated_at descending

✎ View and edit SQL

This data as json, CSV (advanced)

Suggested facets: reactions, created_at (date), updated_at (date)

user 7

  • AndrewILWilliams 7
  • kefirbandi 3
  • mathause 3
  • keewis 3
  • max-sixty 2
  • Zac-HD 1
  • pep8speaks 1

author_association 3

  • CONTRIBUTOR 11
  • MEMBER 8
  • NONE 1

issue 1

  • xr.cov() and xr.corr() · 20 ✖
id html_url issue_url node_id user created_at updated_at ▲ author_association body reactions performed_via_github_app issue
634200431 https://github.com/pydata/xarray/pull/4089#issuecomment-634200431 https://api.github.com/repos/pydata/xarray/issues/4089 MDEyOklzc3VlQ29tbWVudDYzNDIwMDQzMQ== kefirbandi 1277781 2020-05-26T18:31:31Z 2020-05-26T18:31:31Z CONTRIBUTOR

@AndrewWilliams3142 I see. Thanks.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  xr.cov() and xr.corr() 623751213
634165154 https://github.com/pydata/xarray/pull/4089#issuecomment-634165154 https://api.github.com/repos/pydata/xarray/issues/4089 MDEyOklzc3VlQ29tbWVudDYzNDE2NTE1NA== AndrewILWilliams 56925856 2020-05-26T17:26:34Z 2020-05-26T17:26:34Z CONTRIBUTOR

@kefirbandi I didn't want to step on your toes, but I'm happy to put in a PR to fix the typo. :)

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  xr.cov() and xr.corr() 623751213
634157768 https://github.com/pydata/xarray/pull/4089#issuecomment-634157768 https://api.github.com/repos/pydata/xarray/issues/4089 MDEyOklzc3VlQ29tbWVudDYzNDE1Nzc2OA== kefirbandi 1277781 2020-05-26T17:12:41Z 2020-05-26T17:12:41Z CONTRIBUTOR

Well, actually I was thinking, that correcting it for someone who is working on the code on a daily basis is ~30 seconds. For me, I think, it would be quite a bit of overhead for a single character...

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  xr.cov() and xr.corr() 623751213
633922774 https://github.com/pydata/xarray/pull/4089#issuecomment-633922774 https://api.github.com/repos/pydata/xarray/issues/4089 MDEyOklzc3VlQ29tbWVudDYzMzkyMjc3NA== keewis 14808389 2020-05-26T09:43:29Z 2020-05-26T09:43:29Z MEMBER

thanks. Do you want to put in a PR fixing that?

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  xr.cov() and xr.corr() 623751213
633921230 https://github.com/pydata/xarray/pull/4089#issuecomment-633921230 https://api.github.com/repos/pydata/xarray/issues/4089 MDEyOklzc3VlQ29tbWVudDYzMzkyMTIzMA== kefirbandi 1277781 2020-05-26T09:40:12Z 2020-05-26T09:40:12Z CONTRIBUTOR

Just a small comment: in the docs (http://xarray.pydata.org/en/latest/generated/xarray.cov.html#xarray.cov) there is a typo: da_a is declared twice, the second should really be da_b.

{
    "total_count": 1,
    "+1": 1,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  xr.cov() and xr.corr() 623751213
633651803 https://github.com/pydata/xarray/pull/4089#issuecomment-633651803 https://api.github.com/repos/pydata/xarray/issues/4089 MDEyOklzc3VlQ29tbWVudDYzMzY1MTgwMw== max-sixty 5635139 2020-05-25T16:55:26Z 2020-05-25T16:55:26Z MEMBER

Awesome @AndrewWilliams3142 ! Very excited we have this.

Thanks for the review @mathause

Hitting merge; any other feedback is welcome and we can iterate.

{
    "total_count": 3,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 2,
    "rocket": 1,
    "eyes": 0
}
  xr.cov() and xr.corr() 623751213
633592183 https://github.com/pydata/xarray/pull/4089#issuecomment-633592183 https://api.github.com/repos/pydata/xarray/issues/4089 MDEyOklzc3VlQ29tbWVudDYzMzU5MjE4Mw== AndrewILWilliams 56925856 2020-05-25T14:13:46Z 2020-05-25T14:13:46Z CONTRIBUTOR

If you insist ;)

da_a -= da_a.mean(dim=dim)

is indeed marginally faster. As they are already aligned, we don't have to worry about this.

Sweet! On second thought, I might leave it for now...the sun is too nice today. Can always have it as a future PR or something. :)

{
    "total_count": 1,
    "+1": 1,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  xr.cov() and xr.corr() 623751213
633590099 https://github.com/pydata/xarray/pull/4089#issuecomment-633590099 https://api.github.com/repos/pydata/xarray/issues/4089 MDEyOklzc3VlQ29tbWVudDYzMzU5MDA5OQ== mathause 10194086 2020-05-25T14:08:35Z 2020-05-25T14:08:35Z MEMBER

If you insist ;)

da_a -= da_a.mean(dim=dim) is indeed marginally faster. As they are already aligned, we don't have to worry about this.

{
    "total_count": 1,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 1,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  xr.cov() and xr.corr() 623751213
633146890 https://github.com/pydata/xarray/pull/4089#issuecomment-633146890 https://api.github.com/repos/pydata/xarray/issues/4089 MDEyOklzc3VlQ29tbWVudDYzMzE0Njg5MA== pep8speaks 24736507 2020-05-23T22:09:12Z 2020-05-25T13:55:30Z NONE

Hello @AndrewWilliams3142! Thanks for updating this PR. We checked the lines you've touched for PEP 8 issues, and found:

There are currently no PEP 8 issues detected in this Pull Request. Cheers! :beers:

Comment last updated at 2020-05-25 13:55:29 UTC
{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  xr.cov() and xr.corr() 623751213
633582528 https://github.com/pydata/xarray/pull/4089#issuecomment-633582528 https://api.github.com/repos/pydata/xarray/issues/4089 MDEyOklzc3VlQ29tbWVudDYzMzU4MjUyOA== AndrewILWilliams 56925856 2020-05-25T13:50:08Z 2020-05-25T13:50:08Z CONTRIBUTOR

One more thing actually, is there an argument for not defining da_a_std and demeaned_da_a and just performing the operations in place? Defining these variables makes the code more readable but in https://github.com/pydata/xarray/pull/3550#discussion_r355157809 and https://github.com/pydata/xarray/pull/3550#discussion_r355157888 the reviewer suggests this is inefficient?

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  xr.cov() and xr.corr() 623751213
633456698 https://github.com/pydata/xarray/pull/4089#issuecomment-633456698 https://api.github.com/repos/pydata/xarray/issues/4089 MDEyOklzc3VlQ29tbWVudDYzMzQ1NjY5OA== AndrewILWilliams 56925856 2020-05-25T08:44:36Z 2020-05-25T10:55:29Z CONTRIBUTOR

Could you also add a test for the TypeError?

python with raises_regex(TypeError, "Only xr.DataArray is supported"): xr.corr(xr.Dataset(), xr.Dataset())

Where do you mean sorry? Isn't this already there in corr()?

python3 if any(not isinstance(arr, (Variable, DataArray)) for arr in [da_a, da_b]): raise TypeError( "Only xr.DataArray and xr.Variable are supported." "Given {}.".format([type(arr) for arr in [da_a, da_b]]) ) EDIT: Scratch that, I get what you mean :)

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  xr.cov() and xr.corr() 623751213
633449839 https://github.com/pydata/xarray/pull/4089#issuecomment-633449839 https://api.github.com/repos/pydata/xarray/issues/4089 MDEyOklzc3VlQ29tbWVudDYzMzQ0OTgzOQ== mathause 10194086 2020-05-25T08:29:22Z 2020-05-25T08:29:22Z MEMBER

Could you also add a test for the TypeError?

python with raises_regex(TypeError, "Only xr.DataArray is supported"): xr.corr(xr.Dataset(), xr.Dataset())

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  xr.cov() and xr.corr() 623751213
633336577 https://github.com/pydata/xarray/pull/4089#issuecomment-633336577 https://api.github.com/repos/pydata/xarray/issues/4089 MDEyOklzc3VlQ29tbWVudDYzMzMzNjU3Nw== Zac-HD 12229877 2020-05-25T01:39:49Z 2020-05-25T08:29:16Z CONTRIBUTOR

Also, I don't think there is a hypothesis.extra.xarray module, yet. Any comments on that, @Zac-HD?

Not yet, though I'm interested in collaborations to write one!

Our existing Numpy + Pandas integrations, along with the st.builds() and @st.composite strategies, make it reasonably straightforward-if-tedious to compose the objects you want in exquisite detail... the real challenge is to design a clean high-level API which lets you control only the parts you care about. Plausibly we have to let go of that last part to make this work, and just point people at the low-level tools if high-level isn't working.

This third-party module is unidiomatic in that it doesn't have the API design above, but I believe it works. rdturnermtl has a history of great features; we eventually got idiomatic high-performance gufunc shapes upstream and I'm confident we'll get Xarray support eventually too... and sooner if there are people who can help design it :smile:

Just \@-mention me again if this comes up, or I won't be notified.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  xr.cov() and xr.corr() 623751213
633441816 https://github.com/pydata/xarray/pull/4089#issuecomment-633441816 https://api.github.com/repos/pydata/xarray/issues/4089 MDEyOklzc3VlQ29tbWVudDYzMzQ0MTgxNg== AndrewILWilliams 56925856 2020-05-25T08:11:05Z 2020-05-25T08:12:48Z CONTRIBUTOR

Cheers ! I've got a day off today so I'll do another pass through the changes and see if there's any low-hanging fruit I can improve (in addition to np.random, _cov_corr internal methods and maybe apply_ufunc() ) :)

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  xr.cov() and xr.corr() 623751213
633393615 https://github.com/pydata/xarray/pull/4089#issuecomment-633393615 https://api.github.com/repos/pydata/xarray/issues/4089 MDEyOklzc3VlQ29tbWVudDYzMzM5MzYxNQ== max-sixty 5635139 2020-05-25T06:03:24Z 2020-05-25T06:03:24Z MEMBER

This looks great! I think it's fine to have a simple implementation like this, even if it's not perfectly efficient. Well done for getting something working @AndrewWilliams3142 .

For the future: let's move away from using np.random in docstrings, since they're not reproducible, and will never pass a doctest.

Anything remaining before merging?

{
    "total_count": 1,
    "+1": 1,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  xr.cov() and xr.corr() 623751213
633310925 https://github.com/pydata/xarray/pull/4089#issuecomment-633310925 https://api.github.com/repos/pydata/xarray/issues/4089 MDEyOklzc3VlQ29tbWVudDYzMzMxMDkyNQ== keewis 14808389 2020-05-24T22:38:54Z 2020-05-24T22:38:54Z MEMBER

no worries about hypothesis, that's something we can add in a new PR.

Also, I don't think there is a hypothesis.extra.xarray module, yet. Any comments on that, @Zac-HD?

{
    "total_count": 1,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 1,
    "rocket": 0,
    "eyes": 0
}
  xr.cov() and xr.corr() 623751213
633299554 https://github.com/pydata/xarray/pull/4089#issuecomment-633299554 https://api.github.com/repos/pydata/xarray/issues/4089 MDEyOklzc3VlQ29tbWVudDYzMzI5OTU1NA== mathause 10194086 2020-05-24T21:04:03Z 2020-05-24T21:04:03Z MEMBER

Currently corr needs to sanitize the inputs twice, which will be inefficient. One way around this is to define an internal method which can do both, depending on a method keyword (no need to write extra tests for this IMHO):

```python

def corr(da_a, da_b, dim=None, ddof=0):

    return _cov_corr(da_a, da_b, dim=None, ddof=0, method="corr")

def cov(da_a, da_b, dim=None, ddof=0):

return _cov_corr(da_a, da_b, dim=None, ddof=0, method="cov")

def _cov_corr(da_a, da_b, dim=None, ddof=0, method=None):

# compute cov

if method = "cov":
    return cov

# compute corr

return corr

```

Maybe you could use xr.apply_ufunc instead of looping in the tests (might be overkill).

{
    "total_count": 1,
    "+1": 1,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  xr.cov() and xr.corr() 623751213
633286352 https://github.com/pydata/xarray/pull/4089#issuecomment-633286352 https://api.github.com/repos/pydata/xarray/issues/4089 MDEyOklzc3VlQ29tbWVudDYzMzI4NjM1Mg== AndrewILWilliams 56925856 2020-05-24T19:49:30Z 2020-05-24T19:56:24Z CONTRIBUTOR

One problem I came across here is that pandas automatically ignores 'np.nan' values in any corr or cov calculation. This is hard-coded into the package and there's no skipna=False option sadly, so what I've done in the tests is to use the numpy implementation which pandas is built on (see, for example here).

Current tests implemented are (in pseudocode...): - [x] assert_allclose(xr.cov(a, b) / (a.std() * b.std()), xr.corr(a, b)) - [x] assert_allclose(xr.cov(a,a)*(N-1), ((a - a.mean())**2).sum()) - [x] For the example in my previous comment, I now have a loop over all values of (a,x) to reconstruct the covariance / correlation matrix, and check it with an assert_allclose(...). - [x] Add more test arrays, with/without np.nans -- done

@keewis I tried reading the Hypothesis docs and got a bit overwhelmed, so I've stuck with example-based tests for now.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  xr.cov() and xr.corr() 623751213
633216248 https://github.com/pydata/xarray/pull/4089#issuecomment-633216248 https://api.github.com/repos/pydata/xarray/issues/4089 MDEyOklzc3VlQ29tbWVudDYzMzIxNjI0OA== keewis 14808389 2020-05-24T11:22:42Z 2020-05-24T12:09:51Z MEMBER

If you want to test individual values without reimplementing the function in the tests (which is what I suspect comparing with the result of np.cov would require), that might be the only way.

If not, you could also check properties of covariance / correlation matrices, e.g. that assert_allclose(xr.cov(a, b) / (a.std() * b.std()), xr.corr(a, b)) (I'm not sure if I remember that formula correctly) or that the diagonal of the auto-covariance matrix is the same as the variance of the array (with a 1D vector, not sure about more dimensions). If you decide to test using properties, you could also extend our small collection of tests using hypothesis (see #1846).

{
    "total_count": 1,
    "+1": 1,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  xr.cov() and xr.corr() 623751213
633213547 https://github.com/pydata/xarray/pull/4089#issuecomment-633213547 https://api.github.com/repos/pydata/xarray/issues/4089 MDEyOklzc3VlQ29tbWVudDYzMzIxMzU0Nw== AndrewILWilliams 56925856 2020-05-24T10:59:43Z 2020-05-24T11:00:53Z CONTRIBUTOR

The current problem is that we can't use Pandas to fully test xr.cov() or xr.corr() because once you convert the DataArrays to a series or a dataframe for testing, you can't easily index them with a dim parameter. See @r-beer 's comment here https://github.com/pydata/xarray/pull/3550#issuecomment-557895005.

As such, I think it maybe just makes sense to test a few low-dimensional cases? Eg

```python3

da_a = xr.DataArray( np.random.random((3, 21, 4)), coords={"time": pd.date_range("2000-01-01", freq="1D", periods=21)}, dims=("a", "time", "x"), )

da_b = xr.DataArray( np.random.random((3, 21, 4)), coords={"time": pd.date_range("2000-01-01", freq="1D", periods=21)}, dims=("a", "time", "x"), )

xr.cov(da_a, da_b, 'time') <xarray.DataArray (a: 3, x: 4)> array([[-0.01824046, 0.00373796, -0.00601642, -0.00108818], [ 0.00686132, -0.02680119, -0.00639433, -0.00868691], [-0.00889806, 0.02622817, -0.01022208, -0.00101257]]) Dimensions without coordinates: a, x xr.cov(da_a, da_b, 'time').sel(a=0,x=0) <xarray.DataArray ()> array(-0.01824046) da_a.sel(a=0,x=0).to_series().cov(da_b.sel(a=0,x=0).to_series()) -0.018240458880158048 ```

So, while it's easy to check that a few individual points from xr.cov() agree with the pandas implementation, it would require a loop over (a,x) in order to check all of the points for this example. Do people have thoughts about this?

I think it would also make sense to have some test cases where we don't use Pandas at all, but we specify the output manually?

```python3

da_a = xr.DataArray([[1, 2], [1, np.nan]], dims=["x", "time"]) expected = [1, np.nan] actual = xr.corr(da_a, da_a, dim='time') assert_allclose(actual, expected) ```

Does this seem like a good way forward?

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  xr.cov() and xr.corr() 623751213

Advanced export

JSON shape: default, array, newline-delimited, object

CSV options:

CREATE TABLE [issue_comments] (
   [html_url] TEXT,
   [issue_url] TEXT,
   [id] INTEGER PRIMARY KEY,
   [node_id] TEXT,
   [user] INTEGER REFERENCES [users]([id]),
   [created_at] TEXT,
   [updated_at] TEXT,
   [author_association] TEXT,
   [body] TEXT,
   [reactions] TEXT,
   [performed_via_github_app] TEXT,
   [issue] INTEGER REFERENCES [issues]([id])
);
CREATE INDEX [idx_issue_comments_issue]
    ON [issue_comments] ([issue]);
CREATE INDEX [idx_issue_comments_user]
    ON [issue_comments] ([user]);
Powered by Datasette · Queries took 15.424ms · About: xarray-datasette