html_url,issue_url,id,node_id,user,created_at,updated_at,author_association,body,reactions,performed_via_github_app,issue https://github.com/pydata/xarray/issues/4541#issuecomment-719907040,https://api.github.com/repos/pydata/xarray/issues/4541,719907040,MDEyOklzc3VlQ29tbWVudDcxOTkwNzA0MA==,10194086,2020-10-31T09:10:10Z,2020-10-31T09:10:10Z,MEMBER,Yes that would be great.,"{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,729980097 https://github.com/pydata/xarray/issues/4541#issuecomment-717343483,https://api.github.com/repos/pydata/xarray/issues/4541,717343483,MDEyOklzc3VlQ29tbWVudDcxNzM0MzQ4Mw==,2448579,2020-10-27T15:57:50Z,2020-10-27T15:57:50Z,MEMBER,Another option would be to put the check in a `.map_blocks` call for dask arrays. This would only run and raise at compute time.,"{""total_count"": 1, ""+1"": 1, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,729980097 https://github.com/pydata/xarray/issues/4541#issuecomment-717342942,https://api.github.com/repos/pydata/xarray/issues/4541,717342942,MDEyOklzc3VlQ29tbWVudDcxNzM0Mjk0Mg==,2448579,2020-10-27T15:57:03Z,2020-10-27T15:57:03Z,MEMBER,"> The discussion goes back to here: #2922 (comment) (by @dcherian) Ah, sorry! I was thinking of weights as being numpy arrays, not so much dask arrays. > Do you do something between w = data.weighted(weights) and w.mean()? Yeah I think this is the issue. `.weighted` should be lazy. > Thinking a bit more about this I now favour the isnull().any() test and would add a check_weights kwargs. This would be OK. We could also drop the check and let users deal with it, and also add a warning to the docstring.","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,729980097 https://github.com/pydata/xarray/issues/4541#issuecomment-717320425,https://api.github.com/repos/pydata/xarray/issues/4541,717320425,MDEyOklzc3VlQ29tbWVudDcxNzMyMDQyNQ==,10194086,2020-10-27T15:23:55Z,2020-10-27T15:23:55Z,MEMBER,"The discussion goes back to here: https://github.com/pydata/xarray/pull/2922#issuecomment-545200082 (by @dcherian) >> I decided to replace all NaN in the weights with 0. > Can we raise an error instead? It should be easy for the user to do `weights.fillna(0)` instead of relying on xarray's magical behaviour. Thinking a bit more about this I now favour the `isnull().any()` test and would add a `check_weights` kwargs. I would even be fine to set `check_weights=False` per default and say the user is responsible to supply valid weights (but I'd want others to weigh in here). In addition, `a.isnull().any()` is quite a bit faster than `a.fillna(0)` (even if there are no nans present). This is mostly true for numpy arrays, not so much for dask (by my limited tests). On the other hand the `isnull().any()` test is a small percentage of the total time (https://github.com/pydata/xarray/issues/3883#issuecomment-630387515). --- I am also not entirely sure I understand where your issue lies. You eventually _have_ to compute, right? Do you do something between `w = data.weighted(weights)` and `w.mean()`? Ah maybe I understand, your data looks like: * `data`: `` * `weights`: `` And now `weights` gets checked for all 100 models where only one would be relevant. Is this correct? (So as another workaround would be using `xr.align` before sending `weights` to `weighted`.) --- My limited speed tests:
```python import numpy as np import xarray as xr a = xr.DataArray(np.random.randn(1000, 1000, 10, 10)) %timeit a.isnull().any() %timeit a.fillna(0) b = xr.DataArray(np.random.randn(1000, 1000, 10, 10)).chunk(100) %timeit b.isnull().any() %timeit b.fillna(0) ```
","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,729980097 https://github.com/pydata/xarray/issues/4541#issuecomment-717240738,https://api.github.com/repos/pydata/xarray/issues/4541,717240738,MDEyOklzc3VlQ29tbWVudDcxNzI0MDczOA==,10194086,2020-10-27T13:24:43Z,2020-10-27T13:24:43Z,MEMBER,"The other possibility would be to do sth like: ```python def __init__(..., skipna=False): if skipna: weights = weighs.fillna(0) ``` we did decide to not do this somewhere in the discussion, not entirely sure anymore why. ","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,729980097 https://github.com/pydata/xarray/issues/4541#issuecomment-717107362,https://api.github.com/repos/pydata/xarray/issues/4541,717107362,MDEyOklzc3VlQ29tbWVudDcxNzEwNzM2Mg==,10194086,2020-10-27T09:27:25Z,2020-10-27T09:27:25Z,MEMBER,"`weights` cannot contain `NaN`s else the result will just be `NaN`, even with `skipna=True`. But then the weights rarely contain `NaN`. So this test is a bit a trade-off between time and convenience. A kwarg can certainly make sense (was also requested before). I would probably _not_ call the kwarg `skipna`. Maybe `check_weights`? or `check_nan`? (better names welcome) I think `da.isnull().any()` is lazy and it's the `if` that makes it eager. So an alternative would be to make the statement lazy but I don't know how this would be done. The relevant test is here: https://github.com/pydata/xarray/blob/adc55ac4d2883e0c6647f3983c3322ca2c690514/xarray/tests/test_weighted.py#L22 ","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,729980097 https://github.com/pydata/xarray/issues/4541#issuecomment-716937842,https://api.github.com/repos/pydata/xarray/issues/4541,716937842,MDEyOklzc3VlQ29tbWVudDcxNjkzNzg0Mg==,5635139,2020-10-27T02:29:59Z,2020-10-27T06:15:51Z,MEMBER,"If it leads to incorrect results, I agree. If it leads to a lazy error (even if more confusing), or a result array full of NaNs, then I think it's fine. Not super confident on the latter case, tbc. If we want more control, I would advocate for using a standard kwarg that offers control over the computation — e.g. `skip_na` often gives more performance in exchange for (edit: the user) ensuring no `NaN`s — rather than an idiosyncratic kwarg that's derived by the internals of this implementation","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,729980097 https://github.com/pydata/xarray/issues/4541#issuecomment-716928594,https://api.github.com/repos/pydata/xarray/issues/4541,716928594,MDEyOklzc3VlQ29tbWVudDcxNjkyODU5NA==,5635139,2020-10-27T02:00:40Z,2020-10-27T02:00:40Z,MEMBER,"> Sorry if my initial issue was unclear. Not at all, my mistake > So you favor not having a 'skip' kwarg to just internally skipping the call to `.any()` if `weights` is a dask array? 👍 ","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,729980097 https://github.com/pydata/xarray/issues/4541#issuecomment-716913449,https://api.github.com/repos/pydata/xarray/issues/4541,716913449,MDEyOklzc3VlQ29tbWVudDcxNjkxMzQ0OQ==,5635139,2020-10-27T01:13:04Z,2020-10-27T01:13:04Z,MEMBER,"Sorry, I completely misunderstood! I thought you were asking about skipping tests as in pytest, hence my confusion. For sure re skipping those checks with dask arrays.","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,729980097 https://github.com/pydata/xarray/issues/4541#issuecomment-716908428,https://api.github.com/repos/pydata/xarray/issues/4541,716908428,MDEyOklzc3VlQ29tbWVudDcxNjkwODQyOA==,5635139,2020-10-27T00:57:04Z,2020-10-27T01:10:30Z,MEMBER,"I don't have that much context on `xgcm` so others may know better on this. ~Could you help me understand in what context you're running the tests?~ ~IIRC we used to have [`--skip-slow` here](https://github.com/pydata/xarray/blob/master/conftest.py#L16), but it wasn't used that much and so is no longer there. It's definitely possible to add that sort of flag.~","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,729980097 https://github.com/pydata/xarray/issues/4541#issuecomment-716910761,https://api.github.com/repos/pydata/xarray/issues/4541,716910761,MDEyOklzc3VlQ29tbWVudDcxNjkxMDc2MQ==,2448579,2020-10-27T01:04:14Z,2020-10-27T01:04:22Z,MEMBER,The relevant context is that `.any()` will trigger computation on a dask array. Maybe we skip the check using `is_duck_dask_array`?,"{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,729980097