html_url,issue_url,id,node_id,user,created_at,updated_at,author_association,body,reactions,performed_via_github_app,issue https://github.com/pydata/xarray/pull/5390#issuecomment-850843957,https://api.github.com/repos/pydata/xarray/issues/5390,850843957,MDEyOklzc3VlQ29tbWVudDg1MDg0Mzk1Nw==,56925856,2021-05-29T14:37:48Z,2021-05-31T10:27:06Z,CONTRIBUTOR,"@willirath this is cool, but I think it doesn't explain why the tests fail. Currently `da_a.mean()` and the `da_b.mean()` calls *do* know about each other's missing data! That's what we're doing in [these lines](https://github.com/pydata/xarray/blob/master/xarray/core/computation.py#L1370_L1371). @dcherian, I think I've got it to work, but you need to account for the length(s) of the dimension you're calculating the correlation over. (i.e. `(da-da.mean('time')).sum('time')` is not the same as `da.sum('time') - da.mean('time')` because you should _actually_ do `da.sum('time') - da.mean('time')*length_of_time_dim`) This latest commit does this, but I'm not sure whether the added complication is worth it yet? Thoughts welcome. ```python3 def _mean(da): return (da.sum(dim=dim, skipna=True, min_count=1) / (valid_count)) dim_length = da_a.notnull().sum(dim=dim, skipna=True) def _mean_detrended_term(da): return (dim_length * da / (valid_count)) cov = _mean(da_a * da_b) - _mean_detrended_term(da_a.mean(dim=dim) * da_b.mean(dim=dim)) ```","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,904153867 https://github.com/pydata/xarray/pull/5390#issuecomment-850690985,https://api.github.com/repos/pydata/xarray/issues/5390,850690985,MDEyOklzc3VlQ29tbWVudDg1MDY5MDk4NQ==,56925856,2021-05-28T21:43:52Z,2021-05-28T21:44:12Z,CONTRIBUTOR,"> is it just > > ``` > cov = _mean(da_a * da_b) - da_a.mean(dim=dim) * da_b.mean(dim=dim) > ``` I think you'd still have to normalize the second term by `1 / (valid_count)`. However, I just tried both of these approaches and neither pass the test suite, so we may need to do more thinking... ","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,904153867 https://github.com/pydata/xarray/pull/5390#issuecomment-850556738,https://api.github.com/repos/pydata/xarray/issues/5390,850556738,MDEyOklzc3VlQ29tbWVudDg1MDU1NjczOA==,56925856,2021-05-28T17:12:52Z,2021-05-28T17:14:08Z,CONTRIBUTOR,"@willirath this is great stuff, thanks again! So generally it looks like the graph is more efficient when doing operations of the form: ```python3 (X * Y).mean('time') - (X.mean('time') * Y.mean('time')) ``` than doing ```python3 ((X - X.mean('time')) * (Y-Y.mean('time'))).mean('time') ``` or like what I've implemented (see screenshot)? ```python3 intermediate = (X * Y) - (X.mean('time') * Y.mean('time')) intermediate.mean('time') ```  If so, it seems like the most efficient(?) way to do the computation in [_cov_corr()](https://github.com/pydata/xarray/blob/master/xarray/core/computation.py#L1373_L1383) is to combine it all into one line? I can't think of how to do this though... ","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,904153867 https://github.com/pydata/xarray/pull/5390#issuecomment-850276619,https://api.github.com/repos/pydata/xarray/issues/5390,850276619,MDEyOklzc3VlQ29tbWVudDg1MDI3NjYxOQ==,56925856,2021-05-28T09:15:30Z,2021-05-28T09:17:48Z,CONTRIBUTOR,"@willirath , thanks for your example notebook! I'm still trying to get my head around this a bit though. Say you have `da_a` and `da_b` defined as: ```python3 da_a = xr.DataArray( np.array([[1, 2, 3, 4], [1, 0.1, 0.2, 0.3], [2, 3.2, 0.6, 1.8]]), dims=(""space"", ""time""), coords=[ (""space"", [""IA"", ""IL"", ""IN""]), (""time"", pd.date_range(""2000-01-01"", freq=""1D"", periods=4)), ], ).chunk() da_b = xr.DataArray( np.array([[0.2, 0.4, 0.6, 2], [15, 10, 5, 1], [1, 3.2, np.nan, 1.8]]), dims=(""space"", ""time""), coords=[ (""space"", [""IA"", ""IL"", ""IN""]), (""time"", pd.date_range(""2000-01-01"", freq=""1D"", periods=4)), ], ).chunk() ``` The [original computation](https://github.com/pydata/xarray/blob/master/xarray/core/computation.py#L1373_L1375) in `_cov_corr` has a graph something like:  Whereas my alteration now has a graph more like this:  Am I correct in thinking that this is a 'better' computational graph? Because the original chunks are not passed onto later points in the computation? ","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,904153867 https://github.com/pydata/xarray/pull/5284#issuecomment-848612330,https://api.github.com/repos/pydata/xarray/issues/5284,848612330,MDEyOklzc3VlQ29tbWVudDg0ODYxMjMzMA==,56925856,2021-05-26T09:19:50Z,2021-05-26T09:19:50Z,CONTRIBUTOR,"Hey both, I've added a test to check that dask doesn't compute when calling either `xr.corr()` or `xr.cov()`, and also that the end result is still a `dask ` array. Let me know if there's anything I've missed though! thanks for the help :) @dcherian, regarding the `apply_ufunc` approach, I might leave that for now but as you said it can always be a future PR","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,882876804 https://github.com/pydata/xarray/pull/5284#issuecomment-838231568,https://api.github.com/repos/pydata/xarray/issues/5284,838231568,MDEyOklzc3VlQ29tbWVudDgzODIzMTU2OA==,56925856,2021-05-11T10:28:08Z,2021-05-12T20:45:00Z,CONTRIBUTOR,"Thanks for that @dcherian ! I didn't know you could use print debugging on chunked operations like this! One thing actually: If I change `da = da.where(missing_vals)` to `da = da.where(~missing_vals)` then we get the results we'd expect. Do you think this fixes the problem? ``` def _get_valid_values(da, other): da1, da2 = xr.align(da, other, join=""outer"", copy=False) # 2. Ignore the nans missing_vals = np.logical_or(da1.isnull(), da2.isnull()) if missing_vals.any(): da = da.where(~missing_vals) return da else: return da ``` ``` print(da_a.map_blocks(_get_valid_values, args=[da_b]).compute()) <xarray.DataArray (space: 3, time: 4)> array([[1. , 2. , 3. , 4. ], [1. , 0.1, 0.2, 0.3], [2. , 3.2, nan, 1.8]]) Coordinates: * time (time) datetime64[ns] 2000-01-01 2000-01-02 2000-01-03 2000-01-04 * space (space) object 'IA' 'IL' 'IN' * ``` ","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,882876804 https://github.com/pydata/xarray/pull/5284#issuecomment-837032429,https://api.github.com/repos/pydata/xarray/issues/5284,837032429,MDEyOklzc3VlQ29tbWVudDgzNzAzMjQyOQ==,56925856,2021-05-10T17:44:29Z,2021-05-10T17:44:29Z,CONTRIBUTOR,"Hi @dcherian , just thinking about your suggestion for using `map_blocks` on the actual `valid_values` check. I've tested this and was wondering if you could maybe point to where I'm going wrong? It does mask out some of the values in a lazy way, but not the *correct* ones. ```python3 da_a = xr.DataArray( np.array([[1, 2, 3, 4], [1, 0.1, 0.2, 0.3], [2, 3.2, 0.6, 1.8]]), dims=(""space"", ""time""), coords=[ (""space"", [""IA"", ""IL"", ""IN""]), (""time"", pd.date_range(""2000-01-01"", freq=""1D"", periods=4)), ], ).chunk({'time':1}) da_b = xr.DataArray( np.array([[0.2, 0.4, 0.6, 2], [15, 10, 5, 1], [1, 3.2, np.nan, 1.8]]), dims=(""space"", ""time""), coords=[ (""space"", [""IA"", ""IL"", ""IN""]), (""time"", pd.date_range(""2000-01-01"", freq=""1D"", periods=4)), ], ).chunk({'time':1}) print(da_a) >>> <xarray.DataArray (space: 3, time: 4)> array([[1. , 2. , 3. , 4. ], [1. , 0.1, 0.2, 0.3], [2. , 3.2, 0.6, 1.8]]) Coordinates: * space (space) <U2 'IA' 'IL' 'IN' * time (time) datetime64[ns] 2000-01-01 2000-01-02 2000-01-03 2000-01-04 print(da_b) >>> <xarray.DataArray (space: 3, time: 4)> array([[ 0.2, 0.4, 0.6, 2. ], [15. , 10. , 5. , 1. ], [ 1. , 3.2, nan, 1.8]]) Coordinates: * space (space) <U2 'IA' 'IL' 'IN' * time (time) datetime64[ns] 2000-01-01 2000-01-02 2000-01-03 2000-01-04 # Define function to use in map_blocks def _get_valid_values(da, other): da1, da2 = xr.align(da, other, join=""inner"", copy=False) # 2. Ignore the nans missing_vals = np.logical_or(da1.isnull(), da2.isnull()) if missing_vals.any(): da = da.where(missing_vals) return da else: return da # test outp = da_a.map_blocks(_get_valid_values, args=[da_b]) print(outp.compute()) >>> <xarray.DataArray (space: 3, time: 4)> array([[1. , 2. , nan, 4. ], [1. , 0.1, nan, 0.3], [2. , 3.2, 0.6, 1.8]]) Coordinates: * time (time) datetime64[ns] 2000-01-01 2000-01-02 2000-01-03 2000-01-04 * space (space) object 'IA' 'IL' 'IN' ```","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,882876804 https://github.com/pydata/xarray/issues/4816#issuecomment-760905338,https://api.github.com/repos/pydata/xarray/issues/4816,760905338,MDEyOklzc3VlQ29tbWVudDc2MDkwNTMzOA==,56925856,2021-01-15T12:09:32Z,2021-01-15T12:09:32Z,CONTRIBUTOR,"Oh actually, I don't think this is a bug. Because I'm only interpolating over 1d coordinates, `scipy.interpolate.interp1d()` is called, which requires `'fill_value':'extrapolate'` in order to extrapolate rather than throwing a `nan`. If I was interpolating over multidimensional coordinates, then `scipy.interpolate.interpnd()` would have been called, which requires `'fill_value':None` .","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,786839234 https://github.com/pydata/xarray/pull/4351#issuecomment-676494680,https://api.github.com/repos/pydata/xarray/issues/4351,676494680,MDEyOklzc3VlQ29tbWVudDY3NjQ5NDY4MA==,56925856,2020-08-19T15:25:16Z,2020-08-19T15:25:16Z,CONTRIBUTOR,"> The test is not in a function. I think the link to the test looks like it is indented but it's not... My bad ;) ","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,681528611 https://github.com/pydata/xarray/issues/4300#issuecomment-673593388,https://api.github.com/repos/pydata/xarray/issues/4300,673593388,MDEyOklzc3VlQ29tbWVudDY3MzU5MzM4OA==,56925856,2020-08-13T16:59:30Z,2020-08-13T16:59:30Z,CONTRIBUTOR,"cheers @TomNicholas , that's helpful. :) I've started messing with the idea in [this Gist](https://gist.github.com/AndrewWilliams3142/6ea8b4f4287e0d0290f52b2fcd50a662) if you want to have a look. It's pretty hacky at the moment, but might be helpful as a testbed. (And a way of getting my head around how `apply_ufunc` would work in this context)","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,671609109 https://github.com/pydata/xarray/issues/4300#issuecomment-672084619,https://api.github.com/repos/pydata/xarray/issues/4300,672084619,MDEyOklzc3VlQ29tbWVudDY3MjA4NDYxOQ==,56925856,2020-08-11T16:49:00Z,2020-08-11T16:49:29Z,CONTRIBUTOR,"@TomNicholas I'm a bit confused about how the `fit_along` argument would work actually. If you had 2D data and wanted to fit a 1D function to one of the dimensions, wouldn't you have to either take a `mean` (or `slice`?) across the other dimension? Edit: It's been a hot day here, so apologies if this turns out to be a dumb q haha","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,671609109 https://github.com/pydata/xarray/issues/4300#issuecomment-671442450,https://api.github.com/repos/pydata/xarray/issues/4300,671442450,MDEyOklzc3VlQ29tbWVudDY3MTQ0MjQ1MA==,56925856,2020-08-10T16:01:06Z,2020-08-10T16:01:06Z,CONTRIBUTOR,"This sounds very cool! :) I'm not sure that I have much to add, but given @aulemahal 's good point about the complexity of rewriting `curve_fit` from scratch, it seems that maybe a good first step would just be to wrap the existing `scipy` functionality? Alternatively, given that `xr.apply_ufunc` can already do this (though it's probably complicated), perhaps it would be good to just have an example in the documentation? ","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,671609109 https://github.com/pydata/xarray/pull/4096#issuecomment-634184361,https://api.github.com/repos/pydata/xarray/issues/4096,634184361,MDEyOklzc3VlQ29tbWVudDYzNDE4NDM2MQ==,56925856,2020-05-26T18:02:14Z,2020-05-26T18:02:14Z,CONTRIBUTOR,"Also, could I ask a git question? Is there a way of getting a ""clean"" version of xarray to do PR branches off of without just re-forking? I've tried a few different suggestions on StackOverflow but just keen to know what other people's workflow is :) thanks again","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,625064501 https://github.com/pydata/xarray/pull/4089#issuecomment-634165154,https://api.github.com/repos/pydata/xarray/issues/4089,634165154,MDEyOklzc3VlQ29tbWVudDYzNDE2NTE1NA==,56925856,2020-05-26T17:26:34Z,2020-05-26T17:26:34Z,CONTRIBUTOR,"@kefirbandi I didn't want to step on your toes, but I'm happy to put in a PR to fix the typo. :) ","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,623751213 https://github.com/pydata/xarray/pull/4064#issuecomment-633710066,https://api.github.com/repos/pydata/xarray/issues/4064,633710066,MDEyOklzc3VlQ29tbWVudDYzMzcxMDA2Ng==,56925856,2020-05-25T20:38:49Z,2020-05-25T20:38:49Z,CONTRIBUTOR,No problem ! Thanks everyone for helping me get up to speed :),"{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,618828102 https://github.com/pydata/xarray/pull/4089#issuecomment-633592183,https://api.github.com/repos/pydata/xarray/issues/4089,633592183,MDEyOklzc3VlQ29tbWVudDYzMzU5MjE4Mw==,56925856,2020-05-25T14:13:46Z,2020-05-25T14:13:46Z,CONTRIBUTOR,"> If you insist ;) > > ``` > da_a -= da_a.mean(dim=dim) > ``` > > is indeed marginally faster. As they are already aligned, we don't have to worry about this. Sweet! On second thought, I might leave it for now...the sun is too nice today. Can always have it as a future PR or something. :)","{""total_count"": 1, ""+1"": 1, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,623751213 https://github.com/pydata/xarray/pull/4089#issuecomment-633582528,https://api.github.com/repos/pydata/xarray/issues/4089,633582528,MDEyOklzc3VlQ29tbWVudDYzMzU4MjUyOA==,56925856,2020-05-25T13:50:08Z,2020-05-25T13:50:08Z,CONTRIBUTOR,"One more thing actually, is there an argument for not defining `da_a_std` and `demeaned_da_a` and just performing the operations in place? Defining these variables makes the code more readable but in https://github.com/pydata/xarray/pull/3550#discussion_r355157809 and https://github.com/pydata/xarray/pull/3550#discussion_r355157888 the reviewer suggests this is inefficient?","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,623751213 https://github.com/pydata/xarray/pull/4089#issuecomment-633456698,https://api.github.com/repos/pydata/xarray/issues/4089,633456698,MDEyOklzc3VlQ29tbWVudDYzMzQ1NjY5OA==,56925856,2020-05-25T08:44:36Z,2020-05-25T10:55:29Z,CONTRIBUTOR,"> Could you also add a test for the `TypeError`? > > ```python > with raises_regex(TypeError, ""Only xr.DataArray is supported""): > xr.corr(xr.Dataset(), xr.Dataset()) > ``` Where do you mean sorry? Isn't this already there in corr()? ```python3 if any(not isinstance(arr, (Variable, DataArray)) for arr in [da_a, da_b]): raise TypeError( ""Only xr.DataArray and xr.Variable are supported."" ""Given {}."".format([type(arr) for arr in [da_a, da_b]]) ) ``` **EDIT:** Scratch that, I get what you mean :)","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,623751213 https://github.com/pydata/xarray/pull/4089#issuecomment-633441816,https://api.github.com/repos/pydata/xarray/issues/4089,633441816,MDEyOklzc3VlQ29tbWVudDYzMzQ0MTgxNg==,56925856,2020-05-25T08:11:05Z,2020-05-25T08:12:48Z,CONTRIBUTOR,"Cheers ! I've got a day off today so I'll do another pass through the changes and see if there's any low-hanging fruit I can improve (in addition to `np.random`, `_cov_corr` internal methods and maybe `apply_ufunc()` ) :) ","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,623751213 https://github.com/pydata/xarray/pull/4089#issuecomment-633286352,https://api.github.com/repos/pydata/xarray/issues/4089,633286352,MDEyOklzc3VlQ29tbWVudDYzMzI4NjM1Mg==,56925856,2020-05-24T19:49:30Z,2020-05-24T19:56:24Z,CONTRIBUTOR,"One problem I came across here is that `pandas` automatically ignores 'np.nan' values in any `corr` or `cov` calculation. This is hard-coded into the package and there's no `skipna=False` option sadly, so what I've done in the tests is to use the `numpy` implementation which pandas is built on (see, for example [here](https://github.com/pandas-dev/pandas/blob/cb35d8a938c9222d903482d2f66c62fece5a7aae/pandas/core/nanops.py#L1325)). Current tests implemented are (in pseudocode...): - [x] `assert_allclose(xr.cov(a, b) / (a.std() * b.std()), xr.corr(a, b))` - [x] `assert_allclose(xr.cov(a,a)*(N-1), ((a - a.mean())**2).sum())` - [x] For the example in my previous comment, I now have a loop over all values of `(a,x)` to reconstruct the covariance / correlation matrix, and check it with an `assert_allclose(...)`. - [x] Add more test arrays, with/without `np.nans` -- **done** @keewis I tried reading the Hypothesis docs and got a bit overwhelmed, so I've stuck with example-based tests for now.","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,623751213 https://github.com/pydata/xarray/pull/4089#issuecomment-633213547,https://api.github.com/repos/pydata/xarray/issues/4089,633213547,MDEyOklzc3VlQ29tbWVudDYzMzIxMzU0Nw==,56925856,2020-05-24T10:59:43Z,2020-05-24T11:00:53Z,CONTRIBUTOR,"The current problem is that we can't use Pandas to fully test `xr.cov()` or `xr.corr()` because once you convert the `DataArrays` to a `series` or a `dataframe` for testing, you can't easily index them with a `dim` parameter. See @r-beer 's comment here https://github.com/pydata/xarray/pull/3550#issuecomment-557895005. As such, I think it maybe just makes sense to test a few low-dimensional cases? Eg ```python3 >>> da_a = xr.DataArray( np.random.random((3, 21, 4)), coords={""time"": pd.date_range(""2000-01-01"", freq=""1D"", periods=21)}, dims=(""a"", ""time"", ""x""), ) >>> da_b = xr.DataArray( np.random.random((3, 21, 4)), coords={""time"": pd.date_range(""2000-01-01"", freq=""1D"", periods=21)}, dims=(""a"", ""time"", ""x""), ) >>> xr.cov(da_a, da_b, 'time') <xarray.DataArray (a: 3, x: 4)> array([[-0.01824046, 0.00373796, -0.00601642, -0.00108818], [ 0.00686132, -0.02680119, -0.00639433, -0.00868691], [-0.00889806, 0.02622817, -0.01022208, -0.00101257]]) Dimensions without coordinates: a, x >>> xr.cov(da_a, da_b, 'time').sel(a=0,x=0) <xarray.DataArray ()> array(-0.01824046) >>> da_a.sel(a=0,x=0).to_series().cov(da_b.sel(a=0,x=0).to_series()) -0.018240458880158048 ``` So, while it's easy to check that a few individual points from `xr.cov()` agree with the pandas implementation, it would require a loop over `(a,x)` in order to check all of the points for this example. Do people have thoughts about this? I think it would also make sense to have some test cases where we don't use Pandas at all, but we specify the output manually? ```python3 >>> da_a = xr.DataArray([[1, 2], [1, np.nan]], dims=[""x"", ""time""]) >>> expected = [1, np.nan] >>> actual = xr.corr(da_a, da_a, dim='time') >>> assert_allclose(actual, expected) ``` Does this seem like a good way forward? ","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,623751213 https://github.com/pydata/xarray/issues/3784#issuecomment-633145887,https://api.github.com/repos/pydata/xarray/issues/3784,633145887,MDEyOklzc3VlQ29tbWVudDYzMzE0NTg4Nw==,56925856,2020-05-23T21:58:49Z,2020-05-23T21:58:49Z,CONTRIBUTOR,"In a fit of covid-induced insanity, I've decided to have a crack at finishing up #3550 ! I'm playing around with the changes made by @r-beer at the moment, but I'm finding the tests quite confusing - I think they're wrong? But maybe someone could help me out with this? Here's something from `test_computation.py` in #3550 ```python def test_cov(da_a, da_b, dim): def pandas_cov(ts1, ts2): """"""Ensure the ts are aligned and missing values ignored"""""" ts1, ts2 = xr.align(ts1, ts2) valid_values = ts1.notnull() & ts2.notnull() ts1 = ts1.where(valid_values, drop=True) ts2 = ts2.where(valid_values, drop=True) return ts1.to_series().cov(ts2.to_series()) expected = pandas_cov(da_a, da_b) actual = xr.cov(da_a, da_b, dim) assert_allclose(actual, expected) ``` What I don't understand is, why would we expect the Pandas covariance or correlation functions to return anything remotely like the output of `xr.cov()`? The line `ts1.to_series().cov(ts2.to_series())` always produces a scalar value, whereas in most reasonable use cases `xr.cov(da_a, da_b, dim)` would be producing a matrix of values (eg. the pixel-wise correlation in time between two `DataArrays`). I wasn't sure whether to open a PR for this or not? I'm working on it but would require some help to set up some appropriate tests...","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,568378007 https://github.com/pydata/xarray/pull/4064#issuecomment-632128807,https://api.github.com/repos/pydata/xarray/issues/4064,632128807,MDEyOklzc3VlQ29tbWVudDYzMjEyODgwNw==,56925856,2020-05-21T14:49:37Z,2020-05-21T14:49:37Z,CONTRIBUTOR,"@keewis thanks for this! I've added what I think is a suitable test for `DataArray`s, do you think it's also a good idea to have a `DataSet` test? ","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,618828102 https://github.com/pydata/xarray/pull/4064#issuecomment-632090407,https://api.github.com/repos/pydata/xarray/issues/4064,632090407,MDEyOklzc3VlQ29tbWVudDYzMjA5MDQwNw==,56925856,2020-05-21T13:36:41Z,2020-05-21T13:36:41Z,CONTRIBUTOR,"> This could test that `dataarray.chunk(""auto"").data` is the same as `dataarray.data.rechunk(""auto"")` (or something like that). @dcherian Thanks for the tip:) Quick question: Is there a reason why you're specifying the `.data` here? Also I think I'm missing something because I don't get what the difference between `.chunk()` and `.rechunk()` would be in this case. ","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,618828102 https://github.com/pydata/xarray/pull/4064#issuecomment-632035116,https://api.github.com/repos/pydata/xarray/issues/4064,632035116,MDEyOklzc3VlQ29tbWVudDYzMjAzNTExNg==,56925856,2020-05-21T11:30:01Z,2020-05-21T11:30:01Z,CONTRIBUTOR,"Cheers! I forgot about the tests, will add them this week or next hopefully","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,618828102 https://github.com/pydata/xarray/pull/4064#issuecomment-629390609,https://api.github.com/repos/pydata/xarray/issues/4064,629390609,MDEyOklzc3VlQ29tbWVudDYyOTM5MDYwOQ==,56925856,2020-05-15T17:40:39Z,2020-05-15T17:41:25Z,CONTRIBUTOR,"@dcherian do you have any idea about this `mypy` Type error? I can't find much (accessible) documentation on how the `Union[]` is working in this context. ``` xarray/core/dataset.py:1737: error: Argument 2 to ""fromkeys"" of ""dict"" has incompatible type ""Union[Number, Mapping[Hashable, Union[None, Number, Tuple[Number, ...]]]]""; expected ""Union[None, Number, Tuple[Number, ...]]"" xarray/core/dataset.py:1740: error: Item ""Number"" of ""Union[Number, Mapping[Hashable, Union[None, Number, Tuple[Number, ...]]]]"" has no attribute ""keys"" ``` Edit: thanks to everyone for your help so far! ","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,618828102 https://github.com/pydata/xarray/pull/4064#issuecomment-629346101,https://api.github.com/repos/pydata/xarray/issues/4064,629346101,MDEyOklzc3VlQ29tbWVudDYyOTM0NjEwMQ==,56925856,2020-05-15T16:11:04Z,2020-05-15T16:22:24Z,CONTRIBUTOR,"Okay so I've traced the error back to the `map_blocks()` function. I don't fully understand the code for this function in `xarray/core/parallel.py`, but here's a quick report on the different behaviours. Normally, when using the `make_ds()` and `make_da()` functions in `test_dask.py`, **without any changes to `ds.chunk()`** we have: ```python >>> def func(obj): ... result = obj + obj.x + 5 * obj.y ... return result ... >>> xr.map_blocks(func, ds).unify_chunks().chunks Frozen(SortedKeysDict({'x': (4, 4, 2), 'y': (5, 5, 5, 5), 'z': (4,)})) >>> func(ds).chunk().unify_chunks().chunks Frozen(SortedKeysDict({'x': (4, 4, 2), 'y': (5, 5, 5, 5), 'z': (4,)})) ``` However, when I use the changes I've made to `dataset.py` (changing `isinstance(chunks, Number)` to `is_scalar(chunks)`), the behaviour becomes: ```python >>> xr.map_blocks(func, ds).unify_chunks().chunks Frozen(SortedKeysDict({'x': (4, 4, 2), 'y': (5, 5, 5, 5), 'z': (4,)})) >>> func(ds).chunk().unify_chunks().chunks Frozen(SortedKeysDict({'x': (10,), 'y': (20,), 'z': (4,)})) ``` Which means that it now fails the `test_map_blocks()` call in `test_dask.py` line 1077. I've tried to follow through the code and see what is actually happening when this change is made, but I'm out of my depth here. My guess is that `is_scalar(chunks)` is giving the wrong behaviour when `chunks=None` ? **Edit: I think that's the problem!** ```python >>> isinstance(None, numbers.Number) False >>> is_scalar(None) True ``` I'll add in something to catch `None`s and see if it fixes the error... ","{""total_count"": 1, ""+1"": 1, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,618828102 https://github.com/pydata/xarray/pull/4064#issuecomment-629197362,https://api.github.com/repos/pydata/xarray/issues/4064,629197362,MDEyOklzc3VlQ29tbWVudDYyOTE5NzM2Mg==,56925856,2020-05-15T12:05:22Z,2020-05-15T12:05:22Z,CONTRIBUTOR,No unpushed commits ,"{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,618828102 https://github.com/pydata/xarray/pull/4064#issuecomment-629191037,https://api.github.com/repos/pydata/xarray/issues/4064,629191037,MDEyOklzc3VlQ29tbWVudDYyOTE5MTAzNw==,56925856,2020-05-15T11:49:23Z,2020-05-15T11:49:23Z,CONTRIBUTOR,"Do you mean the master merge? If that's wrong would you be able to fix it for me? My bad, hopefully i'll be able to do it more cleanly in future","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,618828102 https://github.com/pydata/xarray/pull/4064#issuecomment-629168282,https://api.github.com/repos/pydata/xarray/issues/4064,629168282,MDEyOklzc3VlQ29tbWVudDYyOTE2ODI4Mg==,56925856,2020-05-15T10:49:43Z,2020-05-15T10:49:43Z,CONTRIBUTOR,"Okay, that makes sense. Though, it seems that I forked the master branch before @kmuehlbauer's commit, which fixed this flake8 issue? So I think I need to make a new fork?","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,618828102 https://github.com/pydata/xarray/pull/4064#issuecomment-629154336,https://api.github.com/repos/pydata/xarray/issues/4064,629154336,MDEyOklzc3VlQ29tbWVudDYyOTE1NDMzNg==,56925856,2020-05-15T10:15:50Z,2020-05-15T10:17:38Z,CONTRIBUTOR,"Okay cheers both! I'll have a look at these now. @keewis sorry I'm still getting used to using this side of Git at the moment, could you clarify what you mean by merge `master` ? Do you mean merge with my local `master`? ","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,618828102 https://github.com/pydata/xarray/pull/4064#issuecomment-629147818,https://api.github.com/repos/pydata/xarray/issues/4064,629147818,MDEyOklzc3VlQ29tbWVudDYyOTE0NzgxOA==,56925856,2020-05-15T10:00:35Z,2020-05-15T10:01:43Z,CONTRIBUTOR,"In my git clone, when I run the `flake8` and `black .` tests, I get the following messages. ``` (xarray-tests) Andrews-MacBook-Pro-2:xarray andrewwilliams$ black . All done! ✨ 🍰 ✨ 143 files left unchanged. (xarray-tests) Andrews-MacBook-Pro-2:xarray andrewwilliams$ flake8 ./xarray/backends/memory.py:43:32: E741 ambiguous variable name 'l' ./xarray/backends/common.py:244:32: E741 ambiguous variable name 'l' ./xarray/backends/.ipynb_checkpoints/memory-checkpoint.py:43:32: E741 ambiguous variable name 'l' ``` I'm not sure why something has changed in these files (I haven't touched them), I also can't work out what the `l` variable is meant to be doing there. Could this somehow be associated with loads of the checks failing below? Thanks! :) ","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,618828102 https://github.com/pydata/xarray/issues/4055#issuecomment-628797255,https://api.github.com/repos/pydata/xarray/issues/4055,628797255,MDEyOklzc3VlQ29tbWVudDYyODc5NzI1NQ==,56925856,2020-05-14T18:01:45Z,2020-05-14T18:01:45Z,CONTRIBUTOR,"I also thought that, after the dask error message it's pretty easy to then look at the `dataset` and check what the problem dimension is. In general though, is that the type of layout you'd suggest for catching and re-raising errors? Using `raise Exception()` ? ","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,617476316 https://github.com/pydata/xarray/issues/4055#issuecomment-628616379,https://api.github.com/repos/pydata/xarray/issues/4055,628616379,MDEyOklzc3VlQ29tbWVudDYyODYxNjM3OQ==,56925856,2020-05-14T12:57:21Z,2020-05-14T17:50:31Z,CONTRIBUTOR,"Nice, that's neater! Would this work, in the `maybe_chunk()` call? Sorry about the basic questions! ```python def maybe_chunk(name, var, chunks): chunks = selkeys(chunks, var.dims) if not chunks: chunks = None if var.ndim > 0: # when rechunking by different amounts, make sure dask names change # by provinding chunks as an input to tokenize. # subtle bugs result otherwise. see GH3350 token2 = tokenize(name, token if token else var._data, chunks) name2 = f""{name_prefix}{name}-{token2}"" try: return var.chunk(chunks, name=name2, lock=lock) except NotImplementedError as err: raise Exception(""Automatic chunking fails for object arrays."" + ""These include cftime DataArrays."") else: return var ```","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,617476316 https://github.com/pydata/xarray/issues/4055#issuecomment-628513777,https://api.github.com/repos/pydata/xarray/issues/4055,628513777,MDEyOklzc3VlQ29tbWVudDYyODUxMzc3Nw==,56925856,2020-05-14T09:26:24Z,2020-05-14T09:26:24Z,CONTRIBUTOR,"Also, the contributing docs have been super clear so far! Thanks! :)","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,617476316 https://github.com/pydata/xarray/issues/4055#issuecomment-628513443,https://api.github.com/repos/pydata/xarray/issues/4055,628513443,MDEyOklzc3VlQ29tbWVudDYyODUxMzQ0Mw==,56925856,2020-05-14T09:25:48Z,2020-05-14T09:25:48Z,CONTRIBUTOR,"Cheers! Just had a look, is it as simple as just changing this line to the following, @dcherian ? ```python if isinstance(chunks, Number) or chunks=='auto': chunks = dict.fromkeys(self.dims, chunks) ``` This seems to work fine in a lot of cases, except automatic chunking isn't implemented for `object` dtypes at the moment, so it fails if you pass a `cftime` coordinate, for example. One option is to automatically use `self=xr.decode_cf(self)` if the input dataset is cftime? Or could just throw an error.","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,617476316 https://github.com/pydata/xarray/issues/4055#issuecomment-628212516,https://api.github.com/repos/pydata/xarray/issues/4055,628212516,MDEyOklzc3VlQ29tbWVudDYyODIxMjUxNg==,56925856,2020-05-13T19:56:34Z,2020-05-13T19:56:34Z,CONTRIBUTOR,"Oh ok I didn't know about this, I'll take a look and read the contribution docs tomorrow ! It'll be my first PR so may need a bit of hand-holding when it comes to tests. Willing to try though!","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,617476316 https://github.com/pydata/xarray/issues/3784#issuecomment-625345337,https://api.github.com/repos/pydata/xarray/issues/3784,625345337,MDEyOklzc3VlQ29tbWVudDYyNTM0NTMzNw==,56925856,2020-05-07T16:02:43Z,2020-05-07T16:02:43Z,CONTRIBUTOR,"Hi @max-sixty, just coming back to this now. It seems @r-beer isn't available...do you know roughly how far away his PR was from completion? I'm getting a little bit lost trying to follow #3550 sorry! Was the main `todo` to avoid the `drop=True` after broadcasting? Is there any idea about what to do instead? ","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,568378007 https://github.com/pydata/xarray/issues/3784#issuecomment-589368049,https://api.github.com/repos/pydata/xarray/issues/3784,589368049,MDEyOklzc3VlQ29tbWVudDU4OTM2ODA0OQ==,56925856,2020-02-20T22:08:01Z,2020-02-20T22:08:01Z,CONTRIBUTOR,"@max-sixty Just had a peruse through a few of the relevant issues, do we know what the status of [#3550 ] is? It seems like @r-beer was pretty close on this, right?","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,568378007 https://github.com/pydata/xarray/issues/3784#issuecomment-589328765,https://api.github.com/repos/pydata/xarray/issues/3784,589328765,MDEyOklzc3VlQ29tbWVudDU4OTMyODc2NQ==,56925856,2020-02-20T21:32:26Z,2020-02-20T21:32:26Z,CONTRIBUTOR,I'll take a look at them!,"{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,568378007