html_url,issue_url,id,node_id,user,created_at,updated_at,author_association,body,reactions,performed_via_github_app,issue https://github.com/pydata/xarray/issues/4541#issuecomment-719907040,https://api.github.com/repos/pydata/xarray/issues/4541,719907040,MDEyOklzc3VlQ29tbWVudDcxOTkwNzA0MA==,10194086,2020-10-31T09:10:10Z,2020-10-31T09:10:10Z,MEMBER,Yes that would be great.,"{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,729980097 https://github.com/pydata/xarray/issues/4541#issuecomment-717625573,https://api.github.com/repos/pydata/xarray/issues/4541,717625573,MDEyOklzc3VlQ29tbWVudDcxNzYyNTU3Mw==,14314623,2020-10-28T00:45:31Z,2020-10-28T00:45:31Z,CONTRIBUTOR,"> Another option would be to put the check in a `.map_blocks` call for dask arrays. This would only run and raise at compute time. Uh that sounds great actually. Same functionality, no triggered computation, and no intervention needed from the user. Should I try to implement this?","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,729980097 https://github.com/pydata/xarray/issues/4541#issuecomment-717343483,https://api.github.com/repos/pydata/xarray/issues/4541,717343483,MDEyOklzc3VlQ29tbWVudDcxNzM0MzQ4Mw==,2448579,2020-10-27T15:57:50Z,2020-10-27T15:57:50Z,MEMBER,Another option would be to put the check in a `.map_blocks` call for dask arrays. This would only run and raise at compute time.,"{""total_count"": 1, ""+1"": 1, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,729980097 https://github.com/pydata/xarray/issues/4541#issuecomment-717342942,https://api.github.com/repos/pydata/xarray/issues/4541,717342942,MDEyOklzc3VlQ29tbWVudDcxNzM0Mjk0Mg==,2448579,2020-10-27T15:57:03Z,2020-10-27T15:57:03Z,MEMBER,"> The discussion goes back to here: #2922 (comment) (by @dcherian) Ah, sorry! I was thinking of weights as being numpy arrays, not so much dask arrays. > Do you do something between w = data.weighted(weights) and w.mean()? Yeah I think this is the issue. `.weighted` should be lazy. > Thinking a bit more about this I now favour the isnull().any() test and would add a check_weights kwargs. This would be OK. We could also drop the check and let users deal with it, and also add a warning to the docstring.","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,729980097 https://github.com/pydata/xarray/issues/4541#issuecomment-717320425,https://api.github.com/repos/pydata/xarray/issues/4541,717320425,MDEyOklzc3VlQ29tbWVudDcxNzMyMDQyNQ==,10194086,2020-10-27T15:23:55Z,2020-10-27T15:23:55Z,MEMBER,"The discussion goes back to here: https://github.com/pydata/xarray/pull/2922#issuecomment-545200082 (by @dcherian) >> I decided to replace all NaN in the weights with 0. > Can we raise an error instead? It should be easy for the user to do `weights.fillna(0)` instead of relying on xarray's magical behaviour. Thinking a bit more about this I now favour the `isnull().any()` test and would add a `check_weights` kwargs. I would even be fine to set `check_weights=False` per default and say the user is responsible to supply valid weights (but I'd want others to weigh in here). In addition, `a.isnull().any()` is quite a bit faster than `a.fillna(0)` (even if there are no nans present). This is mostly true for numpy arrays, not so much for dask (by my limited tests). On the other hand the `isnull().any()` test is a small percentage of the total time (https://github.com/pydata/xarray/issues/3883#issuecomment-630387515). --- I am also not entirely sure I understand where your issue lies. You eventually _have_ to compute, right? Do you do something between `w = data.weighted(weights)` and `w.mean()`? Ah maybe I understand, your data looks like: * `data`: `` * `weights`: `` And now `weights` gets checked for all 100 models where only one would be relevant. Is this correct? (So as another workaround would be using `xr.align` before sending `weights` to `weighted`.) --- My limited speed tests:
```python import numpy as np import xarray as xr a = xr.DataArray(np.random.randn(1000, 1000, 10, 10)) %timeit a.isnull().any() %timeit a.fillna(0) b = xr.DataArray(np.random.randn(1000, 1000, 10, 10)).chunk(100) %timeit b.isnull().any() %timeit b.fillna(0) ```
","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,729980097 https://github.com/pydata/xarray/issues/4541#issuecomment-717266102,https://api.github.com/repos/pydata/xarray/issues/4541,717266102,MDEyOklzc3VlQ29tbWVudDcxNzI2NjEwMg==,14314623,2020-10-27T14:03:34Z,2020-10-27T14:03:34Z,CONTRIBUTOR,"Thanks @mathause , I was wondering how much of a performance trade off `.fillna(0)` is on a dask array with no nans, compared to the check. I favor this, since it allows slicing before the calculation is triggered: I have a current situation where I do a bunch of operations on a large multi-model dataset. The weights are time and member dependent and I am trying to save each member separately. Having the calculation triggered for the full dataset is problematic and `fillna(0)` avoids that (working with a hacked branch where I simply removed the check for nans)","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,729980097 https://github.com/pydata/xarray/issues/4541#issuecomment-717240738,https://api.github.com/repos/pydata/xarray/issues/4541,717240738,MDEyOklzc3VlQ29tbWVudDcxNzI0MDczOA==,10194086,2020-10-27T13:24:43Z,2020-10-27T13:24:43Z,MEMBER,"The other possibility would be to do sth like: ```python def __init__(..., skipna=False): if skipna: weights = weighs.fillna(0) ``` we did decide to not do this somewhere in the discussion, not entirely sure anymore why. ","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,729980097 https://github.com/pydata/xarray/issues/4541#issuecomment-717107362,https://api.github.com/repos/pydata/xarray/issues/4541,717107362,MDEyOklzc3VlQ29tbWVudDcxNzEwNzM2Mg==,10194086,2020-10-27T09:27:25Z,2020-10-27T09:27:25Z,MEMBER,"`weights` cannot contain `NaN`s else the result will just be `NaN`, even with `skipna=True`. But then the weights rarely contain `NaN`. So this test is a bit a trade-off between time and convenience. A kwarg can certainly make sense (was also requested before). I would probably _not_ call the kwarg `skipna`. Maybe `check_weights`? or `check_nan`? (better names welcome) I think `da.isnull().any()` is lazy and it's the `if` that makes it eager. So an alternative would be to make the statement lazy but I don't know how this would be done. The relevant test is here: https://github.com/pydata/xarray/blob/adc55ac4d2883e0c6647f3983c3322ca2c690514/xarray/tests/test_weighted.py#L22 ","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,729980097 https://github.com/pydata/xarray/issues/4541#issuecomment-716937842,https://api.github.com/repos/pydata/xarray/issues/4541,716937842,MDEyOklzc3VlQ29tbWVudDcxNjkzNzg0Mg==,5635139,2020-10-27T02:29:59Z,2020-10-27T06:15:51Z,MEMBER,"If it leads to incorrect results, I agree. If it leads to a lazy error (even if more confusing), or a result array full of NaNs, then I think it's fine. Not super confident on the latter case, tbc. If we want more control, I would advocate for using a standard kwarg that offers control over the computation — e.g. `skip_na` often gives more performance in exchange for (edit: the user) ensuring no `NaN`s — rather than an idiosyncratic kwarg that's derived by the internals of this implementation","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,729980097 https://github.com/pydata/xarray/issues/4541#issuecomment-716974071,https://api.github.com/repos/pydata/xarray/issues/4541,716974071,MDEyOklzc3VlQ29tbWVudDcxNjk3NDA3MQ==,14314623,2020-10-27T04:33:04Z,2020-10-27T04:33:04Z,CONTRIBUTOR,Sounds good. I'll see if I can make some time to test and put up a PR this week.,"{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,729980097 https://github.com/pydata/xarray/issues/4541#issuecomment-716930400,https://api.github.com/repos/pydata/xarray/issues/4541,716930400,MDEyOklzc3VlQ29tbWVudDcxNjkzMDQwMA==,14314623,2020-10-27T02:06:35Z,2020-10-27T02:06:35Z,CONTRIBUTOR,"What would happen in this case **if** a dask array with nans is passed? Would this somehow silently influence the results or would it not matter (in that case I wonder what the check was for). If this could lead to undetected errors I would still consider a kwargs a safer alternative, especially for new users?","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,729980097 https://github.com/pydata/xarray/issues/4541#issuecomment-716928594,https://api.github.com/repos/pydata/xarray/issues/4541,716928594,MDEyOklzc3VlQ29tbWVudDcxNjkyODU5NA==,5635139,2020-10-27T02:00:40Z,2020-10-27T02:00:40Z,MEMBER,"> Sorry if my initial issue was unclear. Not at all, my mistake > So you favor not having a 'skip' kwarg to just internally skipping the call to `.any()` if `weights` is a dask array? 👍 ","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,729980097 https://github.com/pydata/xarray/issues/4541#issuecomment-716927242,https://api.github.com/repos/pydata/xarray/issues/4541,716927242,MDEyOklzc3VlQ29tbWVudDcxNjkyNzI0Mg==,14314623,2020-10-27T01:56:28Z,2020-10-27T01:56:28Z,CONTRIBUTOR,"Sorry if my initial issue was unclear. So you favor not having a 'skip' kwarg to just internally skipping the call to `.any()` if `weights` is a dask array?","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,729980097 https://github.com/pydata/xarray/issues/4541#issuecomment-716913449,https://api.github.com/repos/pydata/xarray/issues/4541,716913449,MDEyOklzc3VlQ29tbWVudDcxNjkxMzQ0OQ==,5635139,2020-10-27T01:13:04Z,2020-10-27T01:13:04Z,MEMBER,"Sorry, I completely misunderstood! I thought you were asking about skipping tests as in pytest, hence my confusion. For sure re skipping those checks with dask arrays.","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,729980097 https://github.com/pydata/xarray/issues/4541#issuecomment-716908428,https://api.github.com/repos/pydata/xarray/issues/4541,716908428,MDEyOklzc3VlQ29tbWVudDcxNjkwODQyOA==,5635139,2020-10-27T00:57:04Z,2020-10-27T01:10:30Z,MEMBER,"I don't have that much context on `xgcm` so others may know better on this. ~Could you help me understand in what context you're running the tests?~ ~IIRC we used to have [`--skip-slow` here](https://github.com/pydata/xarray/blob/master/conftest.py#L16), but it wasn't used that much and so is no longer there. It's definitely possible to add that sort of flag.~","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,729980097 https://github.com/pydata/xarray/issues/4541#issuecomment-716910761,https://api.github.com/repos/pydata/xarray/issues/4541,716910761,MDEyOklzc3VlQ29tbWVudDcxNjkxMDc2MQ==,2448579,2020-10-27T01:04:14Z,2020-10-27T01:04:22Z,MEMBER,The relevant context is that `.any()` will trigger computation on a dask array. Maybe we skip the check using `is_duck_dask_array`?,"{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,729980097