home / github

Menu
  • GraphQL API
  • Search all tables

issue_comments

Table actions
  • GraphQL API for issue_comments

41 rows where user = 45787861 sorted by updated_at descending

✎ View and edit SQL

This data as json, CSV (advanced)

Suggested facets: issue_url, reactions, created_at (date), updated_at (date)

issue 5

  • cov() and corr() - finalization 30
  • cov() and corr() 5
  • doc\environment.yml not found 3
  • Feature request: Compute cross-correlation (similar to pd.Series.corr()) of gridded data 2
  • Creating conda environment as described in the contributing guidelines gives ResolvePackageNotFound error 1

user 1

  • r-beer · 41 ✖

author_association 1

  • NONE 41
id html_url issue_url node_id user created_at updated_at ▲ author_association body reactions performed_via_github_app issue
557895005 https://github.com/pydata/xarray/pull/3550#issuecomment-557895005 https://api.github.com/repos/pydata/xarray/issues/3550 MDEyOklzc3VlQ29tbWVudDU1Nzg5NTAwNQ== r-beer 45787861 2019-11-24T14:41:44Z 2019-11-24T14:41:44Z NONE

I have added several test cases, almost all pass. The ones that don't pass are related to dim="time" and incompletely implemented pandas_cov and pandas_corr, that currently don't have a dim parameter.

I had a look at np.cov, np.corrcoef, pd.Series.cov, pd.frame.cov, pd.Series.corr, and pd.frame.corr: Computation of cov and corr is implemented in many different ways.

xr.cov and xr.corr now imitate the behavior of the respective pd.Series functions.

I actually prefer the numpy behavior, resulting in covariance and correlation matrices. However I feel that efficient implementation of this behavior is above my current understanding of xarray. So, I would highly appreciate your support on this implementation!

Otherwise, I added the ddof parameter to xr.cov to imitate the behavior of the respective pd.Series functions. I now get the results that pd.Series produces. However, I had to set ddof=1 for xr.cov and ddof=0 for xr.corr. Does that make sense?

{
    "total_count": 1,
    "+1": 1,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  cov() and corr() - finalization 525685973
557874527 https://github.com/pydata/xarray/pull/3550#issuecomment-557874527 https://api.github.com/repos/pydata/xarray/issues/3550 MDEyOklzc3VlQ29tbWVudDU1Nzg3NDUyNw== r-beer 45787861 2019-11-24T10:11:31Z 2019-11-24T10:15:19Z NONE

For xarray.corr(array, array, dim='y'), I think we'd expect an array of the same dimensionality as below, with one point as the correlation of 1.0 (because {1,2} is 100% correlated to {1,2}) and another NaN. Does that make sense?

On the one side, I am with you in terms of "What You Put In Is What You Get Out", on the other hand pandas.Series.cov and np.cov have other behaviors that both seem plausible:

![grafik](https://user-images.githubusercontent.com/45787861/69493085-a1342600-0eaa-11ea-991a-f23f4ae348e3.png) ![grafik](https://user-images.githubusercontent.com/45787861/69493094-b3ae5f80-0eaa-11ea-8984-6710e2331882.png)

So for the moment, we might stick with pandas.Series.cov? Or (future PR?) we rather want the np.cov behavior, this would require more effort and the repr should probably be given as xarray instead of np.array, right?

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  cov() and corr() - finalization 525685973
557872510 https://github.com/pydata/xarray/pull/3550#issuecomment-557872510 https://api.github.com/repos/pydata/xarray/issues/3550 MDEyOklzc3VlQ29tbWVudDU1Nzg3MjUxMA== r-beer 45787861 2019-11-24T09:39:30Z 2019-11-24T09:39:30Z NONE

I would start with having the code in the function; i.e. don't abstract too early unless we're really confident on the eventual state. When we find ourselves something multiple times, then we can start pulling it out into more abstractions (e.g. fixtures).

Alright! I found myself having very similar tests and therefore abstracted it to test_func. I completely agree with not abstracting too early and but when having the same structure several times.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  cov() and corr() - finalization 525685973
557829302 https://github.com/pydata/xarray/pull/3550#issuecomment-557829302 https://api.github.com/repos/pydata/xarray/issues/3550 MDEyOklzc3VlQ29tbWVudDU1NzgyOTMwMg== r-beer 45787861 2019-11-23T20:13:45Z 2019-11-23T20:13:45Z NONE

Either way, it would be good to have a clear list of da_a and da_b and respective expected results. I am a bit confused about what we actually would expect for xarray.DataArray([[1, 2], [np.nan, np.nan]], dims=['x', 'y']).

Maybe, as a first step some type checking would be enough, to identify such special cases as the one above and give a ValueError or NotImplementedError?

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  cov() and corr() - finalization 525685973
557828920 https://github.com/pydata/xarray/pull/3550#issuecomment-557828920 https://api.github.com/repos/pydata/xarray/issues/3550 MDEyOklzc3VlQ29tbWVudDU1NzgyODkyMA== r-beer 45787861 2019-11-23T20:07:56Z 2019-11-23T20:08:11Z NONE

@shoyer good spot, thanks

@r-beer we could add that as a test case

Yes, I will add it.

Thinking about test arrays in general: Wouldn't it be good to define some data array fixtures in the conftest.py which can then be used in all different test_function.py tests?

In view of the additional test I started to restructure the test_cov and test_corr to a single test_func. I am now asking myself how much of the data preprocessing shall go to the parametrization and how much remains in the test function...

Any guidelines or suggestions?

Either way, it would be good to have a clear list of da_a and da_b and respective expected results. I am a bit confused about what we actually would expect for xarray.DataArray([[1, 2], [np.nan, np.nan]], dims=['x', 'y']).

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  cov() and corr() - finalization 525685973
557812486 https://github.com/pydata/xarray/pull/3550#issuecomment-557812486 https://api.github.com/repos/pydata/xarray/issues/3550 MDEyOklzc3VlQ29tbWVudDU1NzgxMjQ4Ng== r-beer 45787861 2019-11-23T16:33:58Z 2019-11-23T16:34:39Z NONE

Turns out the assert np.allclose statements actually failed for cov, but not for corr: ```bash actual = xr.cov(ts1, ts2)

  assert np.allclose(expected, actual)

E assert False E + where False = <function allclose at 0x7ff5dbaf0e18>(0.004632253406505826, <xarray.DataArray ()>\narray(0.004323)) E + where <function allclose at 0x7ff5dbaf0e18> = np.allclose ``` 0.00463 vs. 0.00432

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  cov() and corr() - finalization 525685973
557790321 https://github.com/pydata/xarray/pull/3550#issuecomment-557790321 https://api.github.com/repos/pydata/xarray/issues/3550 MDEyOklzc3VlQ29tbWVudDU1Nzc5MDMyMQ== r-beer 45787861 2019-11-23T11:34:12Z 2019-11-23T11:34:12Z NONE

PS: I personally would like to use cov and corr also with dask. Does it also make sense to put this in a future PR, together with the above-mentioned other improvements?

Further PRs:

* Add `dot`, `cov` and `corr` to `computation.rst`
  > PS: It looks like `xr.dot` is only present in the "API reference" and not in https://xarray.pydata.org/en/stable/computation.html. We should add `dot`, `cov` and `corr` to `computation.rst` in a future PR.

* Test (and if necessary implement) `dask` compatibility to `cov` and `corr`

* Improve documentation on `How to contribute to the xarray documentation?`
  > I think it already gives a good overview, but then it might be unclear how to proceed.
  > Maybe it might make sense to add a subsection `How to contribute to the xarray documentation?` in which the different options are shortly described examplary:
  > ```
  > 1. change docstring, e.g. ....
  > 2. update api.rst, e.g. ....
  > 3. update whats-new.rst, e.g. ...
  > ```
  > 
  > 
  > Then, underline that building the documentation locally is optional:
  > ```
  > Optionally, you can build the docs locally as described in the next section.
  > ```

Further PRs:

  • Add dot, cov and corr to computation.rst

    PS: It looks like xr.dot is only present in the "API reference" and not in https://xarray.pydata.org/en/stable/computation.html. We should add dot, cov and corr to computation.rst in a future PR.
    
  • Test (and if necessary implement) dask compatibility to cov and corr

  • Improve documentation on How to contribute to the xarray documentation?

    I think it already gives a good overview, but then it might be unclear how to proceed.
    
    Maybe it might make sense to add a subsection How to contribute to the xarray documentation? in which the different options are shortly described examplary:
    
    1. change docstring, e.g. ....
    2. update api.rst, e.g. ....
    3. update whats-new.rst, e.g. ...
    
    Then, underline that building the documentation locally is optional:
    
    Optionally, you can build the docs locally as described in the next section.
    
  • Extend xarray functionality, such that pairwise correlations and covariances can be easily calculated. Either write a wrapper function or extend xr.corr and xr.cov, such that the following computation is supported with ease: python pairwise_dim = 'space' for region in da_a[pairwise_dim]: print(xr.corr(da_a.sel(space=region.item()), da_a, dim='time').values) Target output would be an xarray with pairwise correlations, e.g.: bash [ 1. -0.43942827 -0.22425502] [-0.43942827 1. 0.30940081] [-0.22425502 0.30940081 1. ]

{
    "total_count": 1,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 1,
    "rocket": 0,
    "eyes": 0
}
  cov() and corr() - finalization 525685973
557787104 https://github.com/pydata/xarray/pull/3550#issuecomment-557787104 https://api.github.com/repos/pydata/xarray/issues/3550 MDEyOklzc3VlQ29tbWVudDU1Nzc4NzEwNA== r-beer 45787861 2019-11-23T10:45:17Z 2019-11-23T10:45:17Z NONE

@dcherian, @max-sixty, @keewis, @Hoeze, second try: any more improvements required?

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  cov() and corr() - finalization 525685973
557781732 https://github.com/pydata/xarray/pull/3550#issuecomment-557781732 https://api.github.com/repos/pydata/xarray/issues/3550 MDEyOklzc3VlQ29tbWVudDU1Nzc4MTczMg== r-beer 45787861 2019-11-23T09:23:46Z 2019-11-23T09:23:46Z NONE

I'm not sure what's going on with the formatting. If you run black --version and black . from the root of the repo, what do you get?

black --version:

shell black, version 19.10b0

black .:

shell reformatted ...\xarray\xarray\core\dataarray.py All done! ✨ �🍰✨✨ 1 file reformatted, 143 files left unchanged.

Now, I additionally get errors from pre-commit claiming that GIT: unstaged files detected.

Guess, the pre-commit installation was not completely correct?

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  cov() and corr() - finalization 525685973
557777317 https://github.com/pydata/xarray/pull/3550#issuecomment-557777317 https://api.github.com/repos/pydata/xarray/issues/3550 MDEyOklzc3VlQ29tbWVudDU1Nzc3NzMxNw== r-beer 45787861 2019-11-23T08:08:44Z 2019-11-23T08:08:44Z NONE

I'm not sure what's going on with the formatting. If you run black --version and black . from the root of the repo, what do you get?

black --version: bash black, version 19.10b0 black .: bash reformatted ...\xarray\xarray\core\dataarray.py All done! ✨ �🍰✨✨ 1 file reformatted, 143 files left unchanged.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  cov() and corr() - finalization 525685973
557706382 https://github.com/pydata/xarray/pull/3550#issuecomment-557706382 https://api.github.com/repos/pydata/xarray/issues/3550 MDEyOklzc3VlQ29tbWVudDU1NzcwNjM4Mg== r-beer 45787861 2019-11-22T21:46:25Z 2019-11-22T21:46:25Z NONE

PS: I personally would like to use cov and corr also with dask. Does it also make sense to put this in a future PR, together with the above-mentioned other improvements?

Further PRs: - Add dot, cov and corr to computation.rst

PS: It looks like xr.dot is only present in the "API reference" and not in https://xarray.pydata.org/en/stable/computation.html. We should add dot, cov and corr to computation.rst in a future PR. - Test (and if necessary implement) dask compatibility to cov and corr - Improve documentation on How to contribute to the xarray documentation? I think it already gives a good overview, but then it might be unclear how to proceed.

Maybe it might make sense to add a subsection How to contribute to the xarray documentation? in which the different options are shortly described examplary:

1. change docstring, e.g. .... 2. update api.rst, e.g. .... 3. update whats-new.rst, e.g. ...

Then, underline that building the documentation locally is optional:

Optionally, you can build the docs locally as described in the next section.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  cov() and corr() - finalization 525685973
557701008 https://github.com/pydata/xarray/pull/3550#issuecomment-557701008 https://api.github.com/repos/pydata/xarray/issues/3550 MDEyOklzc3VlQ29tbWVudDU1NzcwMTAwOA== r-beer 45787861 2019-11-22T21:27:59Z 2019-11-22T21:27:59Z NONE

@dcherian, @max-sixty, @keewis, @Hoeze, this is it, is it?

Let me know if there are some adjustments to be made. 🙂

And I still get trailing whitespace errors, despite having set up pre-commit to run black and flake8 automatically. Do I have to set some parameters to delete the trailing whitespaces?

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  cov() and corr() - finalization 525685973
557085576 https://github.com/pydata/xarray/pull/3550#issuecomment-557085576 https://api.github.com/repos/pydata/xarray/issues/3550 MDEyOklzc3VlQ29tbWVudDU1NzA4NTU3Ng== r-beer 45787861 2019-11-21T13:31:19Z 2019-11-21T13:31:19Z NONE

Computation? Where is xr.dot documented?

xr.dot is implemented in computation, what do you mean by "documented"?

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  cov() and corr() - finalization 525685973
556991085 https://github.com/pydata/xarray/pull/3550#issuecomment-556991085 https://api.github.com/repos/pydata/xarray/issues/3550 MDEyOklzc3VlQ29tbWVudDU1Njk5MTA4NQ== r-beer 45787861 2019-11-21T09:14:20Z 2019-11-21T09:14:20Z NONE

PS: accidently closed and commented instead of commenting

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  cov() and corr() - finalization 525685973
556990852 https://github.com/pydata/xarray/pull/3550#issuecomment-556990852 https://api.github.com/repos/pydata/xarray/issues/3550 MDEyOklzc3VlQ29tbWVudDU1Njk5MDg1Mg== r-beer 45787861 2019-11-21T09:13:44Z 2019-11-21T09:14:04Z NONE

@r-beer let's go with @shoyer 's suggestion to change this from a method to a function xr.cov & xr.corr. Is that OK?

@shoyer, @max-sixty, @keewis, where should these functions be placed?

my first guesses: - computation - dataarray

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  cov() and corr() - finalization 525685973
556517301 https://github.com/pydata/xarray/pull/3550#issuecomment-556517301 https://api.github.com/repos/pydata/xarray/issues/3550 MDEyOklzc3VlQ29tbWVudDU1NjUxNzMwMQ== r-beer 45787861 2019-11-20T22:58:16Z 2019-11-20T22:58:16Z NONE

(lmk if the docs were unclear and we can update)

I think it already gives a good overview, but then it might be unclear how to proceed.

Maybe it might make sense to add a subsection How to contribute to the xarray documentation? in which the different options are shortly described examplary: 1. change docstring, e.g. .... 2. update api.rst, e.g. .... 3. update whats-new.rst, e.g. ... Then, underline that building the documentation locally is optional:

Optionally, you can build the docs locally as described in the next section.

{
    "total_count": 1,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 1,
    "rocket": 0,
    "eyes": 0
}
  cov() and corr() - finalization 525685973
556507107 https://github.com/pydata/xarray/pull/3550#issuecomment-556507107 https://api.github.com/repos/pydata/xarray/issues/3550 MDEyOklzc3VlQ29tbWVudDU1NjUwNzEwNw== r-beer 45787861 2019-11-20T22:48:25Z 2019-11-20T22:48:25Z NONE

For the sake of completeness (and to already learn the procedure) I just added some information to whats-new.rst and api.rst for the methods da.cov and da.corr.

I would be happy for feedback!

Tomorrow, I will rewrite the methods to functions and go through all the steps again. Good night from Germany!

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  cov() and corr() - finalization 525685973
556492992 https://github.com/pydata/xarray/pull/3550#issuecomment-556492992 https://api.github.com/repos/pydata/xarray/issues/3550 MDEyOklzc3VlQ29tbWVudDU1NjQ5Mjk5Mg== r-beer 45787861 2019-11-20T22:34:58Z 2019-11-20T22:34:58Z NONE

@r-beer let's go with @shoyer 's suggestion to change this from a method to a function xr.cov & xr.corr. Is that OK?

Sure! I actually asked myself a similar software design question in another project.

Shouldn't be too much effort I guess? I will tackle it tomorrow. 😴

{
    "total_count": 1,
    "+1": 1,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  cov() and corr() - finalization 525685973
556490349 https://github.com/pydata/xarray/pull/3550#issuecomment-556490349 https://api.github.com/repos/pydata/xarray/issues/3550 MDEyOklzc3VlQ29tbWVudDU1NjQ5MDM0OQ== r-beer 45787861 2019-11-20T22:32:28Z 2019-11-20T22:32:28Z NONE

I am now looking into the generation of the documentation.

What do you mean by this? You shouldn't have to generate anything locally?

Oh, ok. I thought I have to create xarray-docs to update whats-new.rst and api.rst?

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  cov() and corr() - finalization 525685973
556487185 https://github.com/pydata/xarray/issues/3556#issuecomment-556487185 https://api.github.com/repos/pydata/xarray/issues/3556 MDEyOklzc3VlQ29tbWVudDU1NjQ4NzE4NQ== r-beer 45787861 2019-11-20T22:29:30Z 2019-11-20T22:29:30Z NONE

PS: Environment creation worked flawlessly. It's a pleasure to get into xarray contribution, thanks to you guys! 😃

{
    "total_count": 1,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 1,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  doc\environment.yml not found 526243198
556484204 https://github.com/pydata/xarray/issues/3556#issuecomment-556484204 https://api.github.com/repos/pydata/xarray/issues/3556 MDEyOklzc3VlQ29tbWVudDU1NjQ4NDIwNA== r-beer 45787861 2019-11-20T22:26:42Z 2019-11-20T22:26:42Z NONE

@r-beer That's strange, could it be a browser cache issue? I took the screenshot from the stable version of the docs (http://xarray.pydata.org/en/stable/contributing.html#requirements).

Indeed, I deleted the cache and the documentation refreshed. Thanks!

{
    "total_count": 1,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 1,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  doc\environment.yml not found 526243198
556476571 https://github.com/pydata/xarray/issues/3556#issuecomment-556476571 https://api.github.com/repos/pydata/xarray/issues/3556 MDEyOklzc3VlQ29tbWVudDU1NjQ3NjU3MQ== r-beer 45787861 2019-11-20T22:19:36Z 2019-11-20T22:19:36Z NONE

I think this was already updated in #3410? Here's what I see in the contributing guidelines:

Ok, thanks for the information. I followed the official guidelines, but will double-check with the dev version from now on.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  doc\environment.yml not found 526243198
556473132 https://github.com/pydata/xarray/pull/3550#issuecomment-556473132 https://api.github.com/repos/pydata/xarray/issues/3550 MDEyOklzc3VlQ29tbWVudDU1NjQ3MzEzMg== r-beer 45787861 2019-11-20T22:16:24Z 2019-11-20T22:16:24Z NONE

@r-beer your current test case looks fine, no stress on the fixtures!

Alright!

I am now looking into the generation of the documentation. However, it seems like the contributing guidelines are not up-to-date, as it's not possible to create the xarray-docs environment (see #3556).

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  cov() and corr() - finalization 525685973
556404945 https://github.com/pydata/xarray/pull/3550#issuecomment-556404945 https://api.github.com/repos/pydata/xarray/issues/3550 MDEyOklzc3VlQ29tbWVudDU1NjQwNDk0NQ== r-beer 45787861 2019-11-20T21:14:22Z 2019-11-20T21:14:22Z NONE

@max-sixty, I have found the da(request) fixture. Do I correctly understand that the tests you mentioned above can be implemented using this fixture and parametrize decorators as in @pytest.mark.parametrize("da", (1, 2), indirect=True)?

I have tried to wrap my head around the elegant use of fixtures and parametrize but didn't get there yet. If you have some time to show a more elegant solution, I would be happy to learn! If not, I guess this should work, too.

Are the examples OK like this?

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  cov() and corr() - finalization 525685973
556265324 https://github.com/pydata/xarray/pull/3550#issuecomment-556265324 https://api.github.com/repos/pydata/xarray/issues/3550 MDEyOklzc3VlQ29tbWVudDU1NjI2NTMyNA== r-beer 45787861 2019-11-20T19:13:20Z 2019-11-20T19:13:20Z NONE

I'm not sure about the examples in the examples folder, I think those are real-world examples?

What I meant was examples like those in DataArray.integrate (source) that simply demonstrate how to use that function.

OK! That simplifies the tasks quite a bit 🙂

I will implement this and the special cases for the tests, I might add Jupyter Notebook examples later to the documentation, when the dataset.corr functionality is also implemented.

{
    "total_count": 1,
    "+1": 1,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  cov() and corr() - finalization 525685973
556248781 https://github.com/pydata/xarray/pull/3550#issuecomment-556248781 https://api.github.com/repos/pydata/xarray/issues/3550 MDEyOklzc3VlQ29tbWVudDU1NjI0ODc4MQ== r-beer 45787861 2019-11-20T18:59:08Z 2019-11-20T18:59:08Z NONE

We could also add some variants which test along different dimensions / all NaNs / other special cases.

@max-sixty, I have found the da(request) fixture. Do I correctly understand that the tests you mentioned above can be implemented using this fixture and parametrize decorators as in @pytest.mark.parametrize("da", (1, 2), indirect=True)?

{
    "total_count": 1,
    "+1": 1,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  cov() and corr() - finalization 525685973
556244906 https://github.com/pydata/xarray/pull/3550#issuecomment-556244906 https://api.github.com/repos/pydata/xarray/issues/3550 MDEyOklzc3VlQ29tbWVudDU1NjI0NDkwNg== r-beer 45787861 2019-11-20T18:55:57Z 2019-11-20T18:55:57Z NONE

Concerning the examples, I have started to create a Jupyter Notebook in the examples folder. Is this the right way to create the examples?

I have seen that the existing examples contain html code. Why is that? Why not only markdown?

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  cov() and corr() - finalization 525685973
556094542 https://github.com/pydata/xarray/pull/3550#issuecomment-556094542 https://api.github.com/repos/pydata/xarray/issues/3550 MDEyOklzc3VlQ29tbWVudDU1NjA5NDU0Mg== r-beer 45787861 2019-11-20T16:48:16Z 2019-11-20T16:48:55Z NONE

Test looks good. Let's make those adjustments you put in the TODO and add one for the other function?

@max-sixty, do you mean to add a TODO to the function mentioned in this commit: https://github.com/pydata/xarray/pull/3550/commits/d57eb7cfaf1407179080f0ca95e7951e5622b493?

Here, I think the @requires_scipy_or_netCDF4 is actually necessary: python @requires_scipy_or_netCDF4 @requires_dask @pytest.mark.filterwarnings("ignore:use make_scale(name) instead") def test_open_mfdataset_manyfiles(

{
    "total_count": 1,
    "+1": 1,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  cov() and corr() - finalization 525685973
556050094 https://github.com/pydata/xarray/pull/3550#issuecomment-556050094 https://api.github.com/repos/pydata/xarray/issues/3550 MDEyOklzc3VlQ29tbWVudDU1NjA1MDA5NA== r-beer 45787861 2019-11-20T15:19:16Z 2019-11-20T15:19:16Z NONE

Great start!

Thanks for the review @keewis

I think it's fine to leave Dataset for another PR; it'd be great to get this in, and given it's your first PR (welcome!) let's not bloat it

Test looks good. Let's make those adjustments you put in the TODO and add one for the other function? We could also add some variants which test along different dimensions / all NaNs / other special cases.

Thank you two, @max-sixty and @keewis, for the constructive feedback that made the start easier! I would also be happy to go once through the whole process and then implement the PR for the ds separately. This will probably cut the learning curve and accelerate the process.

Additionally, I wonder whether I have to configure black for the spaces between colons?

I had autoformatting running with autopep8 previously but since one of the last VS code updates something broke so I will set this up properly again on Friday or tomorrow.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  cov() and corr() - finalization 525685973
556010240 https://github.com/pydata/xarray/pull/3550#issuecomment-556010240 https://api.github.com/repos/pydata/xarray/issues/3550 MDEyOklzc3VlQ29tbWVudDU1NjAxMDI0MA== r-beer 45787861 2019-11-20T13:47:24Z 2019-11-20T13:47:24Z NONE

since the APIs of DataArray and Dataset are usually kept in sync as much as possible, it might be good to add the new methods to Dataset, too. This doesn't have to be in this PR, though, and I'd wait for confirmation first.

Yes, the functionality that I would be very interested in would be to calculate corr() also between different variables of a dataset as in pandas.DataFrame.corr as well as correlations between different locations within the same DataArray (e.g. correlations of time series with different latitudes).

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  cov() and corr() - finalization 525685973
555999418 https://github.com/pydata/xarray/pull/3550#issuecomment-555999418 https://api.github.com/repos/pydata/xarray/issues/3550 MDEyOklzc3VlQ29tbWVudDU1NTk5OTQxOA== r-beer 45787861 2019-11-20T13:17:37Z 2019-11-20T13:17:37Z NONE

for the new commit, yes. The older commits could be rewritten if you don't like the other email address being public (I guide you through the process if you want) but since PRs get squashed after the merge you don't have to.

OK, no need for the extra effort at this point, the squashing at the end of PR is fine for me. Thank you for the offer, anyway! 🙂

Are any other changes necessary for the PR?

ideas: - examples for corr and cov usage - whats-new.rst

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  cov() and corr() - finalization 525685973
555995427 https://github.com/pydata/xarray/pull/3550#issuecomment-555995427 https://api.github.com/repos/pydata/xarray/issues/3550 MDEyOklzc3VlQ29tbWVudDU1NTk5NTQyNw== r-beer 45787861 2019-11-20T13:06:32Z 2019-11-20T13:06:32Z NONE

There seems to be a problem with your email address (github can't match it to your account). Does it match the one you used for signing up on github?

Dear @keewis, I just set my real email address also as public email address and set my local git user name and email address accordingly. Does this solve the problem?

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  cov() and corr() - finalization 525685973
555966566 https://github.com/pydata/xarray/pull/3550#issuecomment-555966566 https://api.github.com/repos/pydata/xarray/issues/3550 MDEyOklzc3VlQ29tbWVudDU1NTk2NjU2Ng== r-beer 45787861 2019-11-20T11:39:11Z 2019-11-20T11:39:11Z NONE

@max-sixty, all three checks fail because test_open_mfdataset_manyfiles requires either netcdf4 or scipy.

Shall I, therefore, add the @requires_scipy_or_netCDF4 decorator?

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  cov() and corr() - finalization 525685973
555935575 https://github.com/pydata/xarray/issues/3549#issuecomment-555935575 https://api.github.com/repos/pydata/xarray/issues/3549 MDEyOklzc3VlQ29tbWVudDU1NTkzNTU3NQ== r-beer 45787861 2019-11-20T10:12:57Z 2019-11-20T10:12:57Z NONE

Thanks for the fast response @crusaderky!

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Creating conda environment as described in the contributing guidelines gives ResolvePackageNotFound error 525642358
555915343 https://github.com/pydata/xarray/pull/2652#issuecomment-555915343 https://api.github.com/repos/pydata/xarray/issues/2652 MDEyOklzc3VlQ29tbWVudDU1NTkxNTM0Mw== r-beer 45787861 2019-11-20T09:20:39Z 2019-11-20T09:30:31Z NONE

Alright, I have done so and changed basestrings into str, now the tests run (mostly) through locally. bash 1 failed, 6539 passed, 1952 skipped, 37 xfailed, 31 warnings in 86.34s A general question concerning collaboration on existing PRs: Should I push the changes to my fork and then create a new PR that replaces @hrishikeshac's PR or shall I push to @hrishikeshac's fork and see whether it works to update the existing PR?

Or is there another option?

PS: Permission to push to hrishikeshac:corr is denied for me.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  cov() and corr() 396102183
555745623 https://github.com/pydata/xarray/pull/2652#issuecomment-555745623 https://api.github.com/repos/pydata/xarray/issues/2652 MDEyOklzc3VlQ29tbWVudDU1NTc0NTYyMw== r-beer 45787861 2019-11-19T22:27:10Z 2019-11-19T23:00:31Z NONE

Alright, I only got two merge conflicts in dataarray.py:

minor merge conflict concerning imports: 1. accessors -> accessors_td 2. broadcast has been dropped in master?

```python <<<<<<< HEAD from . import ( computation, dtypes, groupby, indexing, ops, pdcompat, resample, rolling, utils, ) from .accessor_dt import DatetimeAccessor from .accessor_str import StringAccessor from .alignment import ( _broadcast_helper, _get_broadcast_dims_map_common_coords, align, reindex_like_indexers, ) ======= from .accessors import DatetimeAccessor from .alignment import align, reindex_like_indexers, broadcast >>>>>>> added da.corr() and da.cov() to dataarray.py. Test added in test_dataarray.py, and tested using pytest. ```

Secondly, some bigger merge conflicts concerning some of dataarray's methods, but they seem to be not in conflict with each other: 1. integrate(), unify_chunks() and map_blocks added in master 2. cov() and corr() added in corr branch 3. It seems like str = property(StringAccessor) should be at the end of dataarray's definition, for mypy reasons...

``` <<<<<<< HEAD def integrate( self, dim: Union[Hashable, Sequence[Hashable]], datetime_unit: str = None ) -> "DataArray": """ integrate the array with the trapezoidal rule. .. note:: This feature is limited to simple cartesian geometry, i.e. dim must be one dimensional. Parameters ---------- dim: hashable, or a sequence of hashable Coordinate(s) used for the integration. datetime_unit: str, optional Can be used to specify the unit if datetime coordinate is used. One of {'Y', 'M', 'W', 'D', 'h', 'm', 's', 'ms', 'us', 'ns', 'ps', 'fs', 'as'} Returns ------- integrated: DataArray See also -------- numpy.trapz: corresponding numpy function Examples -------- >>> da = xr.DataArray(np.arange(12).reshape(4, 3), dims=['x', 'y'], ... coords={'x': [0, 0.1, 1.1, 1.2]}) >>> da <xarray.DataArray (x: 4, y: 3)> array([[ 0, 1, 2], [ 3, 4, 5], [ 6, 7, 8], [ 9, 10, 11]]) Coordinates: * x (x) float64 0.0 0.1 1.1 1.2 Dimensions without coordinates: y >>> >>> da.integrate('x') <xarray.DataArray (y: 3)> array([5.4, 6.6, 7.8]) Dimensions without coordinates: y """ ds = self._to_temp_dataset().integrate(dim, datetime_unit) return self._from_temp_dataset(ds) def unify_chunks(self) -> "DataArray": """ Unify chunk size along all chunked dimensions of this DataArray. Returns ------- DataArray with consistent chunk sizes for all dask-array variables See Also -------- dask.array.core.unify_chunks """ ds = self._to_temp_dataset().unify_chunks() return self._from_temp_dataset(ds) def map_blocks( self, func: "Callable[..., T_DSorDA]", args: Sequence[Any] = (), kwargs: Mapping[str, Any] = None, ) -> "T_DSorDA": """ Apply a function to each chunk of this DataArray. This method is experimental and its signature may change. Parameters ---------- func: callable User-provided function that accepts a DataArray as its first parameter. The function will receive a subset of this DataArray, corresponding to one chunk along each chunked dimension. ``func`` will be executed as ``func(obj_subset, *args, **kwargs)``. The function will be first run on mocked-up data, that looks like this array but has sizes 0, to determine properties of the returned object such as dtype, variable names, new dimensions and new indexes (if any). This function must return either a single DataArray or a single Dataset. This function cannot change size of existing dimensions, or add new chunked dimensions. args: Sequence Passed verbatim to func after unpacking, after the sliced DataArray. xarray objects, if any, will not be split by chunks. Passing dask collections is not allowed. kwargs: Mapping Passed verbatim to func after unpacking. xarray objects, if any, will not be split by chunks. Passing dask collections is not allowed. Returns ------- A single DataArray or Dataset with dask backend, reassembled from the outputs of the function. Notes ----- This method is designed for when one needs to manipulate a whole xarray object within each chunk. In the more common case where one can work on numpy arrays, it is recommended to use apply_ufunc. If none of the variables in this DataArray is backed by dask, calling this method is equivalent to calling ``func(self, *args, **kwargs)``. See Also -------- dask.array.map_blocks, xarray.apply_ufunc, xarray.map_blocks, xarray.Dataset.map_blocks """ from .parallel import map_blocks return map_blocks(func, self, args, kwargs) # this needs to be at the end, or mypy will confuse with `str` # https://mypy.readthedocs.io/en/latest/common_issues.html#dealing-with-conflicting-names str = property(StringAccessor) ======= def cov(self, other, dim = None): """Compute covariance between two DataArray objects along a shared dimension. Parameters ---------- other: DataArray The other array with which the covariance will be computed dim: The dimension along which the covariance will be computed Returns ------- covariance: DataArray """ # 1. Broadcast the two arrays self, other = broadcast(self, other) # 2. Ignore the nans valid_values = self.notnull() & other.notnull() self = self.where(valid_values, drop=True) other = other.where(valid_values, drop=True) valid_count = valid_values.sum(dim) #3. Compute mean and standard deviation along the given dim demeaned_self = self - self.mean(dim = dim) demeaned_other = other - other.mean(dim = dim) #4. Compute covariance along the given dim cov = (demeaned_self*demeaned_other).sum(dim=dim)/(valid_count) return cov def corr(self, other, dim = None): """Compute correlation between two DataArray objects along a shared dimension. Parameters ---------- other: DataArray The other array with which the correlation will be computed dim: The dimension along which the correlation will be computed Returns ------- correlation: DataArray """ # 1. Broadcast the two arrays self, other = broadcast(self, other) # 2. Ignore the nans valid_values = self.notnull() & other.notnull() self = self.where(valid_values, drop=True) other = other.where(valid_values, drop=True) # 3. Compute correlation based on standard deviations and cov() self_std = self.std(dim=dim) other_std = other.std(dim=dim) return self.cov(other, dim = dim)/(self_std*other_std) >>>>>>> added da.corr() and da.cov() to dataarray.py. Test added in test_dataarray.py, and tested using pytest. ```

Can you please comment my suggested changes (accepting either changes from master or both, if no conflicts).

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  cov() and corr() 396102183
555737000 https://github.com/pydata/xarray/pull/2652#issuecomment-555737000 https://api.github.com/repos/pydata/xarray/issues/2652 MDEyOklzc3VlQ29tbWVudDU1NTczNzAwMA== r-beer 45787861 2019-11-19T22:03:46Z 2019-11-19T22:03:46Z NONE

@max-sixty, thanks for the fast response!

Yeah, I get the traceback and already started diving into it. However, I assumed that @hrishikeshac's branch "corr" wasn't up-to-date.

Shall I merge changes from master or develop into corr, before looking further into the tests?

I read http://xarray.pydata.org/en/stable/contributing.html, is this identical to contributing.rst? Following those guidelines, I would use the following commands to "retrieve the changes from the master branch": git fetch upstream git rebase upstream/master

Where upstream = https://github.com/pydata/xarray.git?

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  cov() and corr() 396102183
555734331 https://github.com/pydata/xarray/pull/2652#issuecomment-555734331 https://api.github.com/repos/pydata/xarray/issues/2652 MDEyOklzc3VlQ29tbWVudDU1NTczNDMzMQ== r-beer 45787861 2019-11-19T21:57:02Z 2019-11-19T21:57:02Z NONE

@max-sixty, thanks for the fast response!

Yeah, I get the traceback and already started diving into it. However, I assumed that @hrishikeshac's branch "corr" wasn't up-to-date.

Shall I merge changes from master or develop into corr, before looking further into the tests?

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  cov() and corr() 396102183
555730897 https://github.com/pydata/xarray/pull/2652#issuecomment-555730897 https://api.github.com/repos/pydata/xarray/issues/2652 MDEyOklzc3VlQ29tbWVudDU1NTczMDg5Nw== r-beer 45787861 2019-11-19T21:48:07Z 2019-11-19T21:48:07Z NONE

Dear @Hoeze, I will (try to) finalize this merge request, as I am also very interested in this functionality.

I am new to xarray and contribution. I downloaded @hrishikeshac's code and ran the pytest tests locally.

I get 17 failed, 2088 passed, 2896 skipped, 7 xfailed, 754 warnings in 49.95s.

Is there an elegant way to share "which tests failed where" in order to avoid that I try to fix tests, that might already have been fixed in other branches?

I will already start to get a better understanding of why the tests fail and try to fix them in the meantime.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  cov() and corr() 396102183
555726775 https://github.com/pydata/xarray/issues/1115#issuecomment-555726775 https://api.github.com/repos/pydata/xarray/issues/1115 MDEyOklzc3VlQ29tbWVudDU1NTcyNjc3NQ== r-beer 45787861 2019-11-19T21:36:42Z 2019-11-19T21:36:42Z NONE

@r-beer would be great to finish this off! I think this would be a popular feature. You could take @hrishikeshac 's code (which is close!) and make the final changes.

OK, that means to make #2652 pass, right?

I downloaded the respective branch from @hrishikeshac, and ran the tests locally.

See respective discussion in #2652.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Feature request: Compute cross-correlation (similar to pd.Series.corr()) of gridded data 188996339
555376229 https://github.com/pydata/xarray/issues/1115#issuecomment-555376229 https://api.github.com/repos/pydata/xarray/issues/1115 MDEyOklzc3VlQ29tbWVudDU1NTM3NjIyOQ== r-beer 45787861 2019-11-19T07:44:23Z 2019-11-19T07:45:26Z NONE

I am also highly interested in this function and in contributing to xarray in general!

If I understand correctly, https://github.com/pydata/xarray/pull/2350 and https://github.com/pydata/xarray/pull/2652 do not solve this PR, do they?

How can I help you finishing these PRs?

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Feature request: Compute cross-correlation (similar to pd.Series.corr()) of gridded data 188996339

Advanced export

JSON shape: default, array, newline-delimited, object

CSV options:

CREATE TABLE [issue_comments] (
   [html_url] TEXT,
   [issue_url] TEXT,
   [id] INTEGER PRIMARY KEY,
   [node_id] TEXT,
   [user] INTEGER REFERENCES [users]([id]),
   [created_at] TEXT,
   [updated_at] TEXT,
   [author_association] TEXT,
   [body] TEXT,
   [reactions] TEXT,
   [performed_via_github_app] TEXT,
   [issue] INTEGER REFERENCES [issues]([id])
);
CREATE INDEX [idx_issue_comments_issue]
    ON [issue_comments] ([issue]);
CREATE INDEX [idx_issue_comments_user]
    ON [issue_comments] ([user]);
Powered by Datasette · Queries took 15.637ms · About: xarray-datasette