home / github / issue_comments

Menu
  • Search all tables
  • GraphQL API

issue_comments: 717320425

This data as json

html_url issue_url id node_id user created_at updated_at author_association body reactions performed_via_github_app issue
https://github.com/pydata/xarray/issues/4541#issuecomment-717320425 https://api.github.com/repos/pydata/xarray/issues/4541 717320425 MDEyOklzc3VlQ29tbWVudDcxNzMyMDQyNQ== 10194086 2020-10-27T15:23:55Z 2020-10-27T15:23:55Z MEMBER

The discussion goes back to here: https://github.com/pydata/xarray/pull/2922#issuecomment-545200082 (by @dcherian)

I decided to replace all NaN in the weights with 0.

Can we raise an error instead? It should be easy for the user to do weights.fillna(0) instead of relying on xarray's magical behaviour.

Thinking a bit more about this I now favour the isnull().any() test and would add a check_weights kwargs. I would even be fine to set check_weights=False per default and say the user is responsible to supply valid weights (but I'd want others to weigh in here).

In addition, a.isnull().any() is quite a bit faster than a.fillna(0) (even if there are no nans present). This is mostly true for numpy arrays, not so much for dask (by my limited tests). On the other hand the isnull().any() test is a small percentage of the total time (https://github.com/pydata/xarray/issues/3883#issuecomment-630387515).


I am also not entirely sure I understand where your issue lies. You eventually have to compute, right? Do you do something between w = data.weighted(weights) and w.mean()?

Ah maybe I understand, your data looks like:

  • data: <xarray.DataArray (time: 1000, models: 1)>
  • weights: <xarray.DataArray (time: 1000, models: 100)>

And now weights gets checked for all 100 models where only one would be relevant. Is this correct? (So as another workaround would be using xr.align before sending weights to weighted.)


My limited speed tests:

```python import numpy as np import xarray as xr a = xr.DataArray(np.random.randn(1000, 1000, 10, 10)) %timeit a.isnull().any() %timeit a.fillna(0) b = xr.DataArray(np.random.randn(1000, 1000, 10, 10)).chunk(100) %timeit b.isnull().any() %timeit b.fillna(0) ```
{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  729980097
Powered by Datasette · Queries took 78.473ms · About: xarray-datasette