github: issue_comments: 40 rows where user = 56925856 sorted by updated

40 rows where user = 56925856 sorted by updated_at descending

Search:

descending

id	html_url	issue_url	node_id	user	created_at	updated_at ▲	author_association	body	reactions	issue
850843957	https://github.com/pydata/xarray/pull/5390#issuecomment-850843957	https://api.github.com/repos/pydata/xarray/issues/5390	MDEyOklzc3VlQ29tbWVudDg1MDg0Mzk1Nw==	AndrewILWilliams 56925856	2021-05-29T14:37:48Z	2021-05-31T10:27:06Z	CONTRIBUTOR	@willirath this is cool, but I think it doesn't explain why the tests fail. Currently `da_a.mean()` and the `da_b.mean()` calls do know about each other's missing data! That's what we're doing in these lines. @dcherian, I think I've got it to work, but you need to account for the length(s) of the dimension you're calculating the correlation over. (i.e. `(da-da.mean('time')).sum('time')` is not the same as `da.sum('time') - da.mean('time')` because you should actually do `da.sum('time') - da.mean('time')length_of_time_dim`) This latest commit does this, but I'm not sure whether the added complication is worth it yet? Thoughts welcome. ```python3 def _mean(da): return (da.sum(dim=dim, skipna=True, min_count=1) / (valid_count)) dim_length = da_a.notnull().sum(dim=dim, skipna=True) def _mean_detrended_term(da): return (dim_length da / (valid_count)) cov = _mean(da_a * da_b) - _mean_detrended_term(da_a.mean(dim=dim) * da_b.mean(dim=dim)) ```	{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	Improvements to lazy behaviour of `xr.cov()` and `xr.corr()` 904153867
850690985	https://github.com/pydata/xarray/pull/5390#issuecomment-850690985	https://api.github.com/repos/pydata/xarray/issues/5390	MDEyOklzc3VlQ29tbWVudDg1MDY5MDk4NQ==	AndrewILWilliams 56925856	2021-05-28T21:43:52Z	2021-05-28T21:44:12Z	CONTRIBUTOR	is it just `cov = _mean(da_a * da_b) - da_a.mean(dim=dim) * da_b.mean(dim=dim)` I think you'd still have to normalize the second term by `1 / (valid_count)`. However, I just tried both of these approaches and neither pass the test suite, so we may need to do more thinking...	{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	Improvements to lazy behaviour of `xr.cov()` and `xr.corr()` 904153867
850556738	https://github.com/pydata/xarray/pull/5390#issuecomment-850556738	https://api.github.com/repos/pydata/xarray/issues/5390	MDEyOklzc3VlQ29tbWVudDg1MDU1NjczOA==	AndrewILWilliams 56925856	2021-05-28T17:12:52Z	2021-05-28T17:14:08Z	CONTRIBUTOR	@willirath this is great stuff, thanks again! So generally it looks like the graph is more efficient when doing operations of the form: `python3 (X * Y).mean('time') - (X.mean('time') * Y.mean('time'))` than doing `python3 ((X - X.mean('time')) * (Y-Y.mean('time'))).mean('time')` or like what I've implemented (see screenshot)? ```python3 intermediate = (X * Y) - (X.mean('time') * Y.mean('time')) intermediate.mean('time') ``` If so, it seems like the most efficient(?) way to do the computation in _cov_corr() is to combine it all into one line? I can't think of how to do this though...	{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	Improvements to lazy behaviour of `xr.cov()` and `xr.corr()` 904153867
850276619	https://github.com/pydata/xarray/pull/5390#issuecomment-850276619	https://api.github.com/repos/pydata/xarray/issues/5390	MDEyOklzc3VlQ29tbWVudDg1MDI3NjYxOQ==	AndrewILWilliams 56925856	2021-05-28T09:15:30Z	2021-05-28T09:17:48Z	CONTRIBUTOR	@willirath , thanks for your example notebook! I'm still trying to get my head around this a bit though. Say you have `da_a` and `da_b` defined as: ```python3 da_a = xr.DataArray( np.array([[1, 2, 3, 4], [1, 0.1, 0.2, 0.3], [2, 3.2, 0.6, 1.8]]), dims=("space", "time"), coords=[ ("space", ["IA", "IL", "IN"]), ("time", pd.date_range("2000-01-01", freq="1D", periods=4)), ], ).chunk() da_b = xr.DataArray( np.array([[0.2, 0.4, 0.6, 2], [15, 10, 5, 1], [1, 3.2, np.nan, 1.8]]), dims=("space", "time"), coords=[ ("space", ["IA", "IL", "IN"]), ("time", pd.date_range("2000-01-01", freq="1D", periods=4)), ], ).chunk() ``` The original computation in `_cov_corr` has a graph something like: Whereas my alteration now has a graph more like this: Am I correct in thinking that this is a 'better' computational graph? Because the original chunks are not passed onto later points in the computation?	{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	Improvements to lazy behaviour of `xr.cov()` and `xr.corr()` 904153867
848612330	https://github.com/pydata/xarray/pull/5284#issuecomment-848612330	https://api.github.com/repos/pydata/xarray/issues/5284	MDEyOklzc3VlQ29tbWVudDg0ODYxMjMzMA==	AndrewILWilliams 56925856	2021-05-26T09:19:50Z	2021-05-26T09:19:50Z	CONTRIBUTOR	Hey both, I've added a test to check that dask doesn't compute when calling either `xr.corr()` or `xr.cov()`, and also that the end result is still a `dask` array. Let me know if there's anything I've missed though! thanks for the help :) @dcherian, regarding the `apply_ufunc` approach, I might leave that for now but as you said it can always be a future PR	{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	Dask-friendly nan check in xr.corr() and xr.cov() 882876804
838231568	https://github.com/pydata/xarray/pull/5284#issuecomment-838231568	https://api.github.com/repos/pydata/xarray/issues/5284	MDEyOklzc3VlQ29tbWVudDgzODIzMTU2OA==	AndrewILWilliams 56925856	2021-05-11T10:28:08Z	2021-05-12T20:45:00Z	CONTRIBUTOR	Thanks for that @dcherian ! I didn't know you could use print debugging on chunked operations like this! One thing actually: If I change `da = da.where(missing_vals)` to `da = da.where(~missing_vals)` then we get the results we'd expect. Do you think this fixes the problem? ``` def _get_valid_values(da, other): da1, da2 = xr.align(da, other, join="outer", copy=False) `# 2. Ignore the nans missing_vals = np.logical_or(da1.isnull(), da2.isnull()) if missing_vals.any(): da = da.where(~missing_vals) return da else: return da` ``` `print(da_a.map_blocks(_get_valid_values, args=[da_b]).compute()) <xarray.DataArray (space: 3, time: 4)> array([[1. , 2. , 3. , 4. ], [1. , 0.1, 0.2, 0.3], [2. , 3.2, nan, 1.8]]) Coordinates: * time (time) datetime64[ns] 2000-01-01 2000-01-02 2000-01-03 2000-01-04 * space (space) object 'IA' 'IL' 'IN' *`	{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	Dask-friendly nan check in xr.corr() and xr.cov() 882876804
837032429	https://github.com/pydata/xarray/pull/5284#issuecomment-837032429	https://api.github.com/repos/pydata/xarray/issues/5284	MDEyOklzc3VlQ29tbWVudDgzNzAzMjQyOQ==	AndrewILWilliams 56925856	2021-05-10T17:44:29Z	2021-05-10T17:44:29Z	CONTRIBUTOR	Hi @dcherian , just thinking about your suggestion for using `map_blocks` on the actual `valid_values` check. I've tested this and was wondering if you could maybe point to where I'm going wrong? It does mask out some of the values in a lazy way, but not the correct ones. ```python3 da_a = xr.DataArray( np.array([[1, 2, 3, 4], [1, 0.1, 0.2, 0.3], [2, 3.2, 0.6, 1.8]]), dims=("space", "time"), coords=[ ("space", ["IA", "IL", "IN"]), ("time", pd.date_range("2000-01-01", freq="1D", periods=4)), ], ).chunk({'time':1}) da_b = xr.DataArray( np.array([[0.2, 0.4, 0.6, 2], [15, 10, 5, 1], [1, 3.2, np.nan, 1.8]]), dims=("space", "time"), coords=[ ("space", ["IA", "IL", "IN"]), ("time", pd.date_range("2000-01-01", freq="1D", periods=4)), ], ).chunk({'time':1}) print(da_a) <xarray.DataArray (space: 3, time: 4)> array([[1. , 2. , 3. , 4. ], [1. , 0.1, 0.2, 0.3], [2. , 3.2, 0.6, 1.8]]) Coordinates: * space (space) <U2 'IA' 'IL' 'IN' * time (time) datetime64[ns] 2000-01-01 2000-01-02 2000-01-03 2000-01-04 print(da_b) <xarray.DataArray (space: 3, time: 4)> array([[ 0.2, 0.4, 0.6, 2. ], [15. , 10. , 5. , 1. ], [ 1. , 3.2, nan, 1.8]]) Coordinates: * space (space) <U2 'IA' 'IL' 'IN' * time (time) datetime64[ns] 2000-01-01 2000-01-02 2000-01-03 2000-01-04 Define function to use in map_blocks def _get_valid_values(da, other): da1, da2 = xr.align(da, other, join="inner", copy=False) `# 2. Ignore the nans missing_vals = np.logical_or(da1.isnull(), da2.isnull()) if missing_vals.any(): da = da.where(missing_vals) return da else: return da` test outp = da_a.map_blocks(_get_valid_values, args=[da_b]) print(outp.compute()) <xarray.DataArray (space: 3, time: 4)> array([[1. , 2. , nan, 4. ], [1. , 0.1, nan, 0.3], [2. , 3.2, 0.6, 1.8]]) Coordinates: * time (time) datetime64[ns] 2000-01-01 2000-01-02 2000-01-03 2000-01-04 * space (space) object 'IA' 'IL' 'IN' ```	{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	Dask-friendly nan check in xr.corr() and xr.cov() 882876804
760905338	https://github.com/pydata/xarray/issues/4816#issuecomment-760905338	https://api.github.com/repos/pydata/xarray/issues/4816	MDEyOklzc3VlQ29tbWVudDc2MDkwNTMzOA==	AndrewILWilliams 56925856	2021-01-15T12:09:32Z	2021-01-15T12:09:32Z	CONTRIBUTOR	Oh actually, I don't think this is a bug. Because I'm only interpolating over 1d coordinates, `scipy.interpolate.interp1d()` is called, which requires `'fill_value':'extrapolate'` in order to extrapolate rather than throwing a `nan`. If I was interpolating over multidimensional coordinates, then `scipy.interpolate.interpnd()` would have been called, which requires `'fill_value':None` .	{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	Possible bug with da.interp_like() 786839234
676494680	https://github.com/pydata/xarray/pull/4351#issuecomment-676494680	https://api.github.com/repos/pydata/xarray/issues/4351	MDEyOklzc3VlQ29tbWVudDY3NjQ5NDY4MA==	AndrewILWilliams 56925856	2020-08-19T15:25:16Z	2020-08-19T15:25:16Z	CONTRIBUTOR	The test is not in a function. I think the link to the test looks like it is indented but it's not... My bad ;)	{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	Allow cov & corr to handle missing values 681528611
673593388	https://github.com/pydata/xarray/issues/4300#issuecomment-673593388	https://api.github.com/repos/pydata/xarray/issues/4300	MDEyOklzc3VlQ29tbWVudDY3MzU5MzM4OA==	AndrewILWilliams 56925856	2020-08-13T16:59:30Z	2020-08-13T16:59:30Z	CONTRIBUTOR	cheers @TomNicholas , that's helpful. :) I've started messing with the idea in this Gist if you want to have a look. It's pretty hacky at the moment, but might be helpful as a testbed. (And a way of getting my head around how `apply_ufunc` would work in this context)	{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	General curve fitting method 671609109
672084619	https://github.com/pydata/xarray/issues/4300#issuecomment-672084619	https://api.github.com/repos/pydata/xarray/issues/4300	MDEyOklzc3VlQ29tbWVudDY3MjA4NDYxOQ==	AndrewILWilliams 56925856	2020-08-11T16:49:00Z	2020-08-11T16:49:29Z	CONTRIBUTOR	@TomNicholas I'm a bit confused about how the `fit_along` argument would work actually. If you had 2D data and wanted to fit a 1D function to one of the dimensions, wouldn't you have to either take a `mean` (or `slice`?) across the other dimension? Edit: It's been a hot day here, so apologies if this turns out to be a dumb q haha	{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	General curve fitting method 671609109
671442450	https://github.com/pydata/xarray/issues/4300#issuecomment-671442450	https://api.github.com/repos/pydata/xarray/issues/4300	MDEyOklzc3VlQ29tbWVudDY3MTQ0MjQ1MA==	AndrewILWilliams 56925856	2020-08-10T16:01:06Z	2020-08-10T16:01:06Z	CONTRIBUTOR	This sounds very cool! :) I'm not sure that I have much to add, but given @aulemahal 's good point about the complexity of rewriting `curve_fit` from scratch, it seems that maybe a good first step would just be to wrap the existing `scipy` functionality? Alternatively, given that `xr.apply_ufunc` can already do this (though it's probably complicated), perhaps it would be good to just have an example in the documentation?	{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	General curve fitting method 671609109
634184361	https://github.com/pydata/xarray/pull/4096#issuecomment-634184361	https://api.github.com/repos/pydata/xarray/issues/4096	MDEyOklzc3VlQ29tbWVudDYzNDE4NDM2MQ==	AndrewILWilliams 56925856	2020-05-26T18:02:14Z	2020-05-26T18:02:14Z	CONTRIBUTOR	Also, could I ask a git question? Is there a way of getting a "clean" version of xarray to do PR branches off of without just re-forking? I've tried a few different suggestions on StackOverflow but just keen to know what other people's workflow is :) thanks again	{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	Corrcov typo fix 625064501
634165154	https://github.com/pydata/xarray/pull/4089#issuecomment-634165154	https://api.github.com/repos/pydata/xarray/issues/4089	MDEyOklzc3VlQ29tbWVudDYzNDE2NTE1NA==	AndrewILWilliams 56925856	2020-05-26T17:26:34Z	2020-05-26T17:26:34Z	CONTRIBUTOR	@kefirbandi I didn't want to step on your toes, but I'm happy to put in a PR to fix the typo. :)	{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	xr.cov() and xr.corr() 623751213
633710066	https://github.com/pydata/xarray/pull/4064#issuecomment-633710066	https://api.github.com/repos/pydata/xarray/issues/4064	MDEyOklzc3VlQ29tbWVudDYzMzcxMDA2Ng==	AndrewILWilliams 56925856	2020-05-25T20:38:49Z	2020-05-25T20:38:49Z	CONTRIBUTOR	No problem ! Thanks everyone for helping me get up to speed :)	{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	Auto chunk 618828102
633592183	https://github.com/pydata/xarray/pull/4089#issuecomment-633592183	https://api.github.com/repos/pydata/xarray/issues/4089	MDEyOklzc3VlQ29tbWVudDYzMzU5MjE4Mw==	AndrewILWilliams 56925856	2020-05-25T14:13:46Z	2020-05-25T14:13:46Z	CONTRIBUTOR	If you insist ;) `da_a -= da_a.mean(dim=dim)` is indeed marginally faster. As they are already aligned, we don't have to worry about this. Sweet! On second thought, I might leave it for now...the sun is too nice today. Can always have it as a future PR or something. :)	{ "total_count": 1, "+1": 1, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	xr.cov() and xr.corr() 623751213
633582528	https://github.com/pydata/xarray/pull/4089#issuecomment-633582528	https://api.github.com/repos/pydata/xarray/issues/4089	MDEyOklzc3VlQ29tbWVudDYzMzU4MjUyOA==	AndrewILWilliams 56925856	2020-05-25T13:50:08Z	2020-05-25T13:50:08Z	CONTRIBUTOR	One more thing actually, is there an argument for not defining `da_a_std` and `demeaned_da_a` and just performing the operations in place? Defining these variables makes the code more readable but in https://github.com/pydata/xarray/pull/3550#discussion_r355157809 and https://github.com/pydata/xarray/pull/3550#discussion_r355157888 the reviewer suggests this is inefficient?	{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	xr.cov() and xr.corr() 623751213
633456698	https://github.com/pydata/xarray/pull/4089#issuecomment-633456698	https://api.github.com/repos/pydata/xarray/issues/4089	MDEyOklzc3VlQ29tbWVudDYzMzQ1NjY5OA==	AndrewILWilliams 56925856	2020-05-25T08:44:36Z	2020-05-25T10:55:29Z	CONTRIBUTOR	Could you also add a test for the `TypeError`? `python with raises_regex(TypeError, "Only xr.DataArray is supported"): xr.corr(xr.Dataset(), xr.Dataset())` Where do you mean sorry? Isn't this already there in corr()? `python3 if any(not isinstance(arr, (Variable, DataArray)) for arr in [da_a, da_b]): raise TypeError( "Only xr.DataArray and xr.Variable are supported." "Given {}.".format([type(arr) for arr in [da_a, da_b]]) )` EDIT: Scratch that, I get what you mean :)	{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	xr.cov() and xr.corr() 623751213
633441816	https://github.com/pydata/xarray/pull/4089#issuecomment-633441816	https://api.github.com/repos/pydata/xarray/issues/4089	MDEyOklzc3VlQ29tbWVudDYzMzQ0MTgxNg==	AndrewILWilliams 56925856	2020-05-25T08:11:05Z	2020-05-25T08:12:48Z	CONTRIBUTOR	Cheers ! I've got a day off today so I'll do another pass through the changes and see if there's any low-hanging fruit I can improve (in addition to `np.random`, `_cov_corr` internal methods and maybe `apply_ufunc()` ) :)	{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	xr.cov() and xr.corr() 623751213
633286352	https://github.com/pydata/xarray/pull/4089#issuecomment-633286352	https://api.github.com/repos/pydata/xarray/issues/4089	MDEyOklzc3VlQ29tbWVudDYzMzI4NjM1Mg==	AndrewILWilliams 56925856	2020-05-24T19:49:30Z	2020-05-24T19:56:24Z	CONTRIBUTOR	One problem I came across here is that `pandas` automatically ignores 'np.nan' values in any `corr` or `cov` calculation. This is hard-coded into the package and there's no `skipna=False` option sadly, so what I've done in the tests is to use the `numpy` implementation which pandas is built on (see, for example here). Current tests implemented are (in pseudocode...): - [x] `assert_allclose(xr.cov(a, b) / (a.std() * b.std()), xr.corr(a, b))` - [x] `assert_allclose(xr.cov(a,a)(N-1), ((a - a.mean())2).sum())` - [x] For the example in my previous comment, I now have a loop over all values of `(a,x)` to reconstruct the covariance / correlation matrix, and check it with an `assert_allclose(...)`. - [x] Add more test arrays, with/without `np.nans` -- done* @keewis I tried reading the Hypothesis docs and got a bit overwhelmed, so I've stuck with example-based tests for now.	{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	xr.cov() and xr.corr() 623751213
633213547	https://github.com/pydata/xarray/pull/4089#issuecomment-633213547	https://api.github.com/repos/pydata/xarray/issues/4089	MDEyOklzc3VlQ29tbWVudDYzMzIxMzU0Nw==	AndrewILWilliams 56925856	2020-05-24T10:59:43Z	2020-05-24T11:00:53Z	CONTRIBUTOR	The current problem is that we can't use Pandas to fully test `xr.cov()` or `xr.corr()` because once you convert the `DataArrays` to a `series` or a `dataframe` for testing, you can't easily index them with a `dim` parameter. See @r-beer 's comment here https://github.com/pydata/xarray/pull/3550#issuecomment-557895005. As such, I think it maybe just makes sense to test a few low-dimensional cases? Eg ```python3 da_a = xr.DataArray( np.random.random((3, 21, 4)), coords={"time": pd.date_range("2000-01-01", freq="1D", periods=21)}, dims=("a", "time", "x"), ) da_b = xr.DataArray( np.random.random((3, 21, 4)), coords={"time": pd.date_range("2000-01-01", freq="1D", periods=21)}, dims=("a", "time", "x"), ) xr.cov(da_a, da_b, 'time') <xarray.DataArray (a: 3, x: 4)> array([[-0.01824046, 0.00373796, -0.00601642, -0.00108818], [ 0.00686132, -0.02680119, -0.00639433, -0.00868691], [-0.00889806, 0.02622817, -0.01022208, -0.00101257]]) Dimensions without coordinates: a, x xr.cov(da_a, da_b, 'time').sel(a=0,x=0) <xarray.DataArray ()> array(-0.01824046) da_a.sel(a=0,x=0).to_series().cov(da_b.sel(a=0,x=0).to_series()) -0.018240458880158048 ``` So, while it's easy to check that a few individual points from `xr.cov()` agree with the pandas implementation, it would require a loop over `(a,x)` in order to check all of the points for this example. Do people have thoughts about this? I think it would also make sense to have some test cases where we don't use Pandas at all, but we specify the output manually? ```python3 da_a = xr.DataArray([[1, 2], [1, np.nan]], dims=["x", "time"]) expected = [1, np.nan] actual = xr.corr(da_a, da_a, dim='time') assert_allclose(actual, expected) ``` Does this seem like a good way forward?	{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	xr.cov() and xr.corr() 623751213
633145887	https://github.com/pydata/xarray/issues/3784#issuecomment-633145887	https://api.github.com/repos/pydata/xarray/issues/3784	MDEyOklzc3VlQ29tbWVudDYzMzE0NTg4Nw==	AndrewILWilliams 56925856	2020-05-23T21:58:49Z	2020-05-23T21:58:49Z	CONTRIBUTOR	In a fit of covid-induced insanity, I've decided to have a crack at finishing up #3550 ! I'm playing around with the changes made by @r-beer at the moment, but I'm finding the tests quite confusing - I think they're wrong? But maybe someone could help me out with this? Here's something from `test_computation.py` in #3550 ```python def test_cov(da_a, da_b, dim): def pandas_cov(ts1, ts2): """Ensure the ts are aligned and missing values ignored""" ts1, ts2 = xr.align(ts1, ts2) valid_values = ts1.notnull() & ts2.notnull() `ts1 = ts1.where(valid_values, drop=True) ts2 = ts2.where(valid_values, drop=True) return ts1.to_series().cov(ts2.to_series()) expected = pandas_cov(da_a, da_b) actual = xr.cov(da_a, da_b, dim) assert_allclose(actual, expected)` ``` What I don't understand is, why would we expect the Pandas covariance or correlation functions to return anything remotely like the output of `xr.cov()`? The line `ts1.to_series().cov(ts2.to_series())` always produces a scalar value, whereas in most reasonable use cases `xr.cov(da_a, da_b, dim)` would be producing a matrix of values (eg. the pixel-wise correlation in time between two `DataArrays`). I wasn't sure whether to open a PR for this or not? I'm working on it but would require some help to set up some appropriate tests...	{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	Function for regressing/correlating multiple fields? 568378007
632128807	https://github.com/pydata/xarray/pull/4064#issuecomment-632128807	https://api.github.com/repos/pydata/xarray/issues/4064	MDEyOklzc3VlQ29tbWVudDYzMjEyODgwNw==	AndrewILWilliams 56925856	2020-05-21T14:49:37Z	2020-05-21T14:49:37Z	CONTRIBUTOR	@keewis thanks for this! I've added what I think is a suitable test for `DataArray`s, do you think it's also a good idea to have a `DataSet` test?	{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	Auto chunk 618828102
632090407	https://github.com/pydata/xarray/pull/4064#issuecomment-632090407	https://api.github.com/repos/pydata/xarray/issues/4064	MDEyOklzc3VlQ29tbWVudDYzMjA5MDQwNw==	AndrewILWilliams 56925856	2020-05-21T13:36:41Z	2020-05-21T13:36:41Z	CONTRIBUTOR	This could test that `dataarray.chunk("auto").data` is the same as `dataarray.data.rechunk("auto")` (or something like that). @dcherian Thanks for the tip:) Quick question: Is there a reason why you're specifying the `.data` here? Also I think I'm missing something because I don't get what the difference between `.chunk()` and `.rechunk()` would be in this case.	{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	Auto chunk 618828102
632035116	https://github.com/pydata/xarray/pull/4064#issuecomment-632035116	https://api.github.com/repos/pydata/xarray/issues/4064	MDEyOklzc3VlQ29tbWVudDYzMjAzNTExNg==	AndrewILWilliams 56925856	2020-05-21T11:30:01Z	2020-05-21T11:30:01Z	CONTRIBUTOR	Cheers! I forgot about the tests, will add them this week or next hopefully	{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	Auto chunk 618828102
629390609	https://github.com/pydata/xarray/pull/4064#issuecomment-629390609	https://api.github.com/repos/pydata/xarray/issues/4064	MDEyOklzc3VlQ29tbWVudDYyOTM5MDYwOQ==	AndrewILWilliams 56925856	2020-05-15T17:40:39Z	2020-05-15T17:41:25Z	CONTRIBUTOR	@dcherian do you have any idea about this `mypy` Type error? I can't find much (accessible) documentation on how the `Union[]` is working in this context. `xarray/core/dataset.py:1737: error: Argument 2 to "fromkeys" of "dict" has incompatible type "Union[Number, Mapping[Hashable, Union[None, Number, Tuple[Number, ...]]]]"; expected "Union[None, Number, Tuple[Number, ...]]" xarray/core/dataset.py:1740: error: Item "Number" of "Union[Number, Mapping[Hashable, Union[None, Number, Tuple[Number, ...]]]]" has no attribute "keys"` Edit: thanks to everyone for your help so far!	{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	Auto chunk 618828102
629346101	https://github.com/pydata/xarray/pull/4064#issuecomment-629346101	https://api.github.com/repos/pydata/xarray/issues/4064	MDEyOklzc3VlQ29tbWVudDYyOTM0NjEwMQ==	AndrewILWilliams 56925856	2020-05-15T16:11:04Z	2020-05-15T16:22:24Z	CONTRIBUTOR	Okay so I've traced the error back to the `map_blocks()` function. I don't fully understand the code for this function in `xarray/core/parallel.py`, but here's a quick report on the different behaviours. Normally, when using the `make_ds()` and `make_da()` functions in `test_dask.py`, without any changes to `ds.chunk()` we have: ```python def func(obj): ... result = obj + obj.x + 5 * obj.y ... return result ... xr.map_blocks(func, ds).unify_chunks().chunks Frozen(SortedKeysDict({'x': (4, 4, 2), 'y': (5, 5, 5, 5), 'z': (4,)})) func(ds).chunk().unify_chunks().chunks Frozen(SortedKeysDict({'x': (4, 4, 2), 'y': (5, 5, 5, 5), 'z': (4,)})) ``` However, when I use the changes I've made to `dataset.py` (changing `isinstance(chunks, Number)` to `is_scalar(chunks)`), the behaviour becomes: ```python xr.map_blocks(func, ds).unify_chunks().chunks Frozen(SortedKeysDict({'x': (4, 4, 2), 'y': (5, 5, 5, 5), 'z': (4,)})) func(ds).chunk().unify_chunks().chunks Frozen(SortedKeysDict({'x': (10,), 'y': (20,), 'z': (4,)})) ``` Which means that it now fails the `test_map_blocks()` call in `test_dask.py` line 1077. I've tried to follow through the code and see what is actually happening when this change is made, but I'm out of my depth here. My guess is that `is_scalar(chunks)` is giving the wrong behaviour when `chunks=None` ? Edit: I think that's the problem! ```python isinstance(None, numbers.Number) False is_scalar(None) True ``` I'll add in something to catch `None`s and see if it fixes the error...	{ "total_count": 1, "+1": 1, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	Auto chunk 618828102
629197362	https://github.com/pydata/xarray/pull/4064#issuecomment-629197362	https://api.github.com/repos/pydata/xarray/issues/4064	MDEyOklzc3VlQ29tbWVudDYyOTE5NzM2Mg==	AndrewILWilliams 56925856	2020-05-15T12:05:22Z	2020-05-15T12:05:22Z	CONTRIBUTOR	No unpushed commits	{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	Auto chunk 618828102
629191037	https://github.com/pydata/xarray/pull/4064#issuecomment-629191037	https://api.github.com/repos/pydata/xarray/issues/4064	MDEyOklzc3VlQ29tbWVudDYyOTE5MTAzNw==	AndrewILWilliams 56925856	2020-05-15T11:49:23Z	2020-05-15T11:49:23Z	CONTRIBUTOR	Do you mean the master merge? If that's wrong would you be able to fix it for me? My bad, hopefully i'll be able to do it more cleanly in future	{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	Auto chunk 618828102
629168282	https://github.com/pydata/xarray/pull/4064#issuecomment-629168282	https://api.github.com/repos/pydata/xarray/issues/4064	MDEyOklzc3VlQ29tbWVudDYyOTE2ODI4Mg==	AndrewILWilliams 56925856	2020-05-15T10:49:43Z	2020-05-15T10:49:43Z	CONTRIBUTOR	Okay, that makes sense. Though, it seems that I forked the master branch before @kmuehlbauer's commit, which fixed this flake8 issue? So I think I need to make a new fork?	{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	Auto chunk 618828102
629154336	https://github.com/pydata/xarray/pull/4064#issuecomment-629154336	https://api.github.com/repos/pydata/xarray/issues/4064	MDEyOklzc3VlQ29tbWVudDYyOTE1NDMzNg==	AndrewILWilliams 56925856	2020-05-15T10:15:50Z	2020-05-15T10:17:38Z	CONTRIBUTOR	Okay cheers both! I'll have a look at these now. @keewis sorry I'm still getting used to using this side of Git at the moment, could you clarify what you mean by merge `master` ? Do you mean merge with my local `master`?	{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	Auto chunk 618828102
629147818	https://github.com/pydata/xarray/pull/4064#issuecomment-629147818	https://api.github.com/repos/pydata/xarray/issues/4064	MDEyOklzc3VlQ29tbWVudDYyOTE0NzgxOA==	AndrewILWilliams 56925856	2020-05-15T10:00:35Z	2020-05-15T10:01:43Z	CONTRIBUTOR	In my git clone, when I run the `flake8` and `black .` tests, I get the following messages. `(xarray-tests) Andrews-MacBook-Pro-2:xarray andrewwilliams$ black . All done! ✨ 🍰 ✨ 143 files left unchanged. (xarray-tests) Andrews-MacBook-Pro-2:xarray andrewwilliams$ flake8 ./xarray/backends/memory.py:43:32: E741 ambiguous variable name 'l' ./xarray/backends/common.py:244:32: E741 ambiguous variable name 'l' ./xarray/backends/.ipynb_checkpoints/memory-checkpoint.py:43:32: E741 ambiguous variable name 'l'` I'm not sure why something has changed in these files (I haven't touched them), I also can't work out what the `l` variable is meant to be doing there. Could this somehow be associated with loads of the checks failing below? Thanks! :)	{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	Auto chunk 618828102
628797255	https://github.com/pydata/xarray/issues/4055#issuecomment-628797255	https://api.github.com/repos/pydata/xarray/issues/4055	MDEyOklzc3VlQ29tbWVudDYyODc5NzI1NQ==	AndrewILWilliams 56925856	2020-05-14T18:01:45Z	2020-05-14T18:01:45Z	CONTRIBUTOR	I also thought that, after the dask error message it's pretty easy to then look at the `dataset` and check what the problem dimension is. In general though, is that the type of layout you'd suggest for catching and re-raising errors? Using `raise Exception()` ?	{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	Automatic chunking of arrays ? 617476316
628616379	https://github.com/pydata/xarray/issues/4055#issuecomment-628616379	https://api.github.com/repos/pydata/xarray/issues/4055	MDEyOklzc3VlQ29tbWVudDYyODYxNjM3OQ==	AndrewILWilliams 56925856	2020-05-14T12:57:21Z	2020-05-14T17:50:31Z	CONTRIBUTOR	Nice, that's neater! Would this work, in the `maybe_chunk()` call? Sorry about the basic questions! python def maybe_chunk(name, var, chunks): chunks = selkeys(chunks, var.dims) if not chunks: chunks = None if var.ndim > 0: # when rechunking by different amounts, make sure dask names change # by provinding chunks as an input to tokenize. # subtle bugs result otherwise. see GH3350 token2 = tokenize(name, token if token else var._data, chunks) name2 = f"{name_prefix}{name}-{token2}" try: return var.chunk(chunks, name=name2, lock=lock) except NotImplementedError as err: raise Exception("Automatic chunking fails for object arrays." + "These include cftime DataArrays.") else: return var	{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	Automatic chunking of arrays ? 617476316
628513777	https://github.com/pydata/xarray/issues/4055#issuecomment-628513777	https://api.github.com/repos/pydata/xarray/issues/4055	MDEyOklzc3VlQ29tbWVudDYyODUxMzc3Nw==	AndrewILWilliams 56925856	2020-05-14T09:26:24Z	2020-05-14T09:26:24Z	CONTRIBUTOR	Also, the contributing docs have been super clear so far! Thanks! :)	{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	Automatic chunking of arrays ? 617476316
628513443	https://github.com/pydata/xarray/issues/4055#issuecomment-628513443	https://api.github.com/repos/pydata/xarray/issues/4055	MDEyOklzc3VlQ29tbWVudDYyODUxMzQ0Mw==	AndrewILWilliams 56925856	2020-05-14T09:25:48Z	2020-05-14T09:25:48Z	CONTRIBUTOR	Cheers! Just had a look, is it as simple as just changing this line to the following, @dcherian ? `python if isinstance(chunks, Number) or chunks=='auto': chunks = dict.fromkeys(self.dims, chunks)` This seems to work fine in a lot of cases, except automatic chunking isn't implemented for `object` dtypes at the moment, so it fails if you pass a `cftime` coordinate, for example. One option is to automatically use `self=xr.decode_cf(self)` if the input dataset is cftime? Or could just throw an error.	{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	Automatic chunking of arrays ? 617476316
628212516	https://github.com/pydata/xarray/issues/4055#issuecomment-628212516	https://api.github.com/repos/pydata/xarray/issues/4055	MDEyOklzc3VlQ29tbWVudDYyODIxMjUxNg==	AndrewILWilliams 56925856	2020-05-13T19:56:34Z	2020-05-13T19:56:34Z	CONTRIBUTOR	Oh ok I didn't know about this, I'll take a look and read the contribution docs tomorrow ! It'll be my first PR so may need a bit of hand-holding when it comes to tests. Willing to try though!	{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	Automatic chunking of arrays ? 617476316
625345337	https://github.com/pydata/xarray/issues/3784#issuecomment-625345337	https://api.github.com/repos/pydata/xarray/issues/3784	MDEyOklzc3VlQ29tbWVudDYyNTM0NTMzNw==	AndrewILWilliams 56925856	2020-05-07T16:02:43Z	2020-05-07T16:02:43Z	CONTRIBUTOR	Hi @max-sixty, just coming back to this now. It seems @r-beer isn't available...do you know roughly how far away his PR was from completion? I'm getting a little bit lost trying to follow #3550 sorry! Was the main `todo` to avoid the `drop=True` after broadcasting? Is there any idea about what to do instead?	{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	Function for regressing/correlating multiple fields? 568378007
589368049	https://github.com/pydata/xarray/issues/3784#issuecomment-589368049	https://api.github.com/repos/pydata/xarray/issues/3784	MDEyOklzc3VlQ29tbWVudDU4OTM2ODA0OQ==	AndrewILWilliams 56925856	2020-02-20T22:08:01Z	2020-02-20T22:08:01Z	CONTRIBUTOR	@max-sixty Just had a peruse through a few of the relevant issues, do we know what the status of [#3550 ] is? It seems like @r-beer was pretty close on this, right?	{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	Function for regressing/correlating multiple fields? 568378007
589328765	https://github.com/pydata/xarray/issues/3784#issuecomment-589328765	https://api.github.com/repos/pydata/xarray/issues/3784	MDEyOklzc3VlQ29tbWVudDU4OTMyODc2NQ==	AndrewILWilliams 56925856	2020-02-20T21:32:26Z	2020-02-20T21:32:26Z	CONTRIBUTOR	I'll take a look at them!	{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	Function for regressing/correlating multiple fields? 568378007

Advanced export

JSON shape: default, array, newline-delimited, object

CREATE TABLE [issue_comments] (
   [html_url] TEXT,
   [issue_url] TEXT,
   [id] INTEGER PRIMARY KEY,
   [node_id] TEXT,
   [user] INTEGER REFERENCES [users]([id]),
   [created_at] TEXT,
   [updated_at] TEXT,
   [author_association] TEXT,
   [body] TEXT,
   [reactions] TEXT,
   [performed_via_github_app] TEXT,
   [issue] INTEGER REFERENCES [issues]([id])
);
CREATE INDEX [idx_issue_comments_issue]
    ON [issue_comments] ([issue]);
CREATE INDEX [idx_issue_comments_user]
    ON [issue_comments] ([user]);

issue_comments

40 rows where user = 56925856 sorted by updated_at descending

Define function to use in map_blocks

test

Advanced export