home / github

Menu
  • Search all tables
  • GraphQL API

issue_comments

Table actions
  • GraphQL API for issue_comments

7 rows where author_association = "CONTRIBUTOR", issue = 623751213 and user = 56925856 sorted by updated_at descending

✎ View and edit SQL

This data as json, CSV (advanced)

Suggested facets: reactions, created_at (date), updated_at (date)

user 1

  • AndrewILWilliams · 7 ✖

issue 1

  • xr.cov() and xr.corr() · 7 ✖

author_association 1

  • CONTRIBUTOR · 7 ✖
id html_url issue_url node_id user created_at updated_at ▲ author_association body reactions performed_via_github_app issue
634165154 https://github.com/pydata/xarray/pull/4089#issuecomment-634165154 https://api.github.com/repos/pydata/xarray/issues/4089 MDEyOklzc3VlQ29tbWVudDYzNDE2NTE1NA== AndrewILWilliams 56925856 2020-05-26T17:26:34Z 2020-05-26T17:26:34Z CONTRIBUTOR

@kefirbandi I didn't want to step on your toes, but I'm happy to put in a PR to fix the typo. :)

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  xr.cov() and xr.corr() 623751213
633592183 https://github.com/pydata/xarray/pull/4089#issuecomment-633592183 https://api.github.com/repos/pydata/xarray/issues/4089 MDEyOklzc3VlQ29tbWVudDYzMzU5MjE4Mw== AndrewILWilliams 56925856 2020-05-25T14:13:46Z 2020-05-25T14:13:46Z CONTRIBUTOR

If you insist ;)

da_a -= da_a.mean(dim=dim)

is indeed marginally faster. As they are already aligned, we don't have to worry about this.

Sweet! On second thought, I might leave it for now...the sun is too nice today. Can always have it as a future PR or something. :)

{
    "total_count": 1,
    "+1": 1,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  xr.cov() and xr.corr() 623751213
633582528 https://github.com/pydata/xarray/pull/4089#issuecomment-633582528 https://api.github.com/repos/pydata/xarray/issues/4089 MDEyOklzc3VlQ29tbWVudDYzMzU4MjUyOA== AndrewILWilliams 56925856 2020-05-25T13:50:08Z 2020-05-25T13:50:08Z CONTRIBUTOR

One more thing actually, is there an argument for not defining da_a_std and demeaned_da_a and just performing the operations in place? Defining these variables makes the code more readable but in https://github.com/pydata/xarray/pull/3550#discussion_r355157809 and https://github.com/pydata/xarray/pull/3550#discussion_r355157888 the reviewer suggests this is inefficient?

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  xr.cov() and xr.corr() 623751213
633456698 https://github.com/pydata/xarray/pull/4089#issuecomment-633456698 https://api.github.com/repos/pydata/xarray/issues/4089 MDEyOklzc3VlQ29tbWVudDYzMzQ1NjY5OA== AndrewILWilliams 56925856 2020-05-25T08:44:36Z 2020-05-25T10:55:29Z CONTRIBUTOR

Could you also add a test for the TypeError?

python with raises_regex(TypeError, "Only xr.DataArray is supported"): xr.corr(xr.Dataset(), xr.Dataset())

Where do you mean sorry? Isn't this already there in corr()?

python3 if any(not isinstance(arr, (Variable, DataArray)) for arr in [da_a, da_b]): raise TypeError( "Only xr.DataArray and xr.Variable are supported." "Given {}.".format([type(arr) for arr in [da_a, da_b]]) ) EDIT: Scratch that, I get what you mean :)

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  xr.cov() and xr.corr() 623751213
633441816 https://github.com/pydata/xarray/pull/4089#issuecomment-633441816 https://api.github.com/repos/pydata/xarray/issues/4089 MDEyOklzc3VlQ29tbWVudDYzMzQ0MTgxNg== AndrewILWilliams 56925856 2020-05-25T08:11:05Z 2020-05-25T08:12:48Z CONTRIBUTOR

Cheers ! I've got a day off today so I'll do another pass through the changes and see if there's any low-hanging fruit I can improve (in addition to np.random, _cov_corr internal methods and maybe apply_ufunc() ) :)

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  xr.cov() and xr.corr() 623751213
633286352 https://github.com/pydata/xarray/pull/4089#issuecomment-633286352 https://api.github.com/repos/pydata/xarray/issues/4089 MDEyOklzc3VlQ29tbWVudDYzMzI4NjM1Mg== AndrewILWilliams 56925856 2020-05-24T19:49:30Z 2020-05-24T19:56:24Z CONTRIBUTOR

One problem I came across here is that pandas automatically ignores 'np.nan' values in any corr or cov calculation. This is hard-coded into the package and there's no skipna=False option sadly, so what I've done in the tests is to use the numpy implementation which pandas is built on (see, for example here).

Current tests implemented are (in pseudocode...): - [x] assert_allclose(xr.cov(a, b) / (a.std() * b.std()), xr.corr(a, b)) - [x] assert_allclose(xr.cov(a,a)*(N-1), ((a - a.mean())**2).sum()) - [x] For the example in my previous comment, I now have a loop over all values of (a,x) to reconstruct the covariance / correlation matrix, and check it with an assert_allclose(...). - [x] Add more test arrays, with/without np.nans -- done

@keewis I tried reading the Hypothesis docs and got a bit overwhelmed, so I've stuck with example-based tests for now.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  xr.cov() and xr.corr() 623751213
633213547 https://github.com/pydata/xarray/pull/4089#issuecomment-633213547 https://api.github.com/repos/pydata/xarray/issues/4089 MDEyOklzc3VlQ29tbWVudDYzMzIxMzU0Nw== AndrewILWilliams 56925856 2020-05-24T10:59:43Z 2020-05-24T11:00:53Z CONTRIBUTOR

The current problem is that we can't use Pandas to fully test xr.cov() or xr.corr() because once you convert the DataArrays to a series or a dataframe for testing, you can't easily index them with a dim parameter. See @r-beer 's comment here https://github.com/pydata/xarray/pull/3550#issuecomment-557895005.

As such, I think it maybe just makes sense to test a few low-dimensional cases? Eg

```python3

da_a = xr.DataArray( np.random.random((3, 21, 4)), coords={"time": pd.date_range("2000-01-01", freq="1D", periods=21)}, dims=("a", "time", "x"), )

da_b = xr.DataArray( np.random.random((3, 21, 4)), coords={"time": pd.date_range("2000-01-01", freq="1D", periods=21)}, dims=("a", "time", "x"), )

xr.cov(da_a, da_b, 'time') <xarray.DataArray (a: 3, x: 4)> array([[-0.01824046, 0.00373796, -0.00601642, -0.00108818], [ 0.00686132, -0.02680119, -0.00639433, -0.00868691], [-0.00889806, 0.02622817, -0.01022208, -0.00101257]]) Dimensions without coordinates: a, x xr.cov(da_a, da_b, 'time').sel(a=0,x=0) <xarray.DataArray ()> array(-0.01824046) da_a.sel(a=0,x=0).to_series().cov(da_b.sel(a=0,x=0).to_series()) -0.018240458880158048 ```

So, while it's easy to check that a few individual points from xr.cov() agree with the pandas implementation, it would require a loop over (a,x) in order to check all of the points for this example. Do people have thoughts about this?

I think it would also make sense to have some test cases where we don't use Pandas at all, but we specify the output manually?

```python3

da_a = xr.DataArray([[1, 2], [1, np.nan]], dims=["x", "time"]) expected = [1, np.nan] actual = xr.corr(da_a, da_a, dim='time') assert_allclose(actual, expected) ```

Does this seem like a good way forward?

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  xr.cov() and xr.corr() 623751213

Advanced export

JSON shape: default, array, newline-delimited, object

CSV options:

CREATE TABLE [issue_comments] (
   [html_url] TEXT,
   [issue_url] TEXT,
   [id] INTEGER PRIMARY KEY,
   [node_id] TEXT,
   [user] INTEGER REFERENCES [users]([id]),
   [created_at] TEXT,
   [updated_at] TEXT,
   [author_association] TEXT,
   [body] TEXT,
   [reactions] TEXT,
   [performed_via_github_app] TEXT,
   [issue] INTEGER REFERENCES [issues]([id])
);
CREATE INDEX [idx_issue_comments_issue]
    ON [issue_comments] ([issue]);
CREATE INDEX [idx_issue_comments_user]
    ON [issue_comments] ([user]);
Powered by Datasette · Queries took 1522.908ms · About: xarray-datasette