home / github / issue_comments

Menu
  • GraphQL API
  • Search all tables

issue_comments: 601824129

This data as json

html_url issue_url id node_id user created_at updated_at author_association body reactions performed_via_github_app issue
https://github.com/pydata/xarray/pull/2922#issuecomment-601824129 https://api.github.com/repos/pydata/xarray/issues/2922 601824129 MDEyOklzc3VlQ29tbWVudDYwMTgyNDEyOQ== 10194086 2020-03-20T17:31:15Z 2020-03-20T17:31:15Z MEMBER

There is some stuff I can do to reduce the memory footprint if skipna=False or not da.isnull().any(). Also, the functions should support dask arrays out of the box.


ideally dot() would support skipna, so you could eliminate the da = da.fillna(0.0) and pass the skipna down the line. But alas it doesn't...

Yes, this would be nice. xr.dot uses np.einsum which is quite a beast that I don't entirely see through. I don't expect it to support NaNs any time soon.

What could be done, though is to only do da = da.fillna(0.0) if da contains NaNs.

(da * weights).sum(dim=dim, skipna=skipna) would likely make things worse, I think, as it would necessarily create a temporary array of sized at least da, no?

I assume so. I don't know what kind of temporary variables np.einsum creates. Also np.einsum is wrapped in xr.apply_ufunc so all kinds of magic is going on.

Either way, this only addresses the da = da.fillna(0.0), not the mask = da.notnull().

Again this could be avoided if skipna=False or if (and only if) there are no NaNs in da.

Also, perhaps the test if weights.isnull().any() in Weighted.__init__() should be optional?

Do you want to leave it away for performance reasons? Because it was a deliberate decision to not support NaNs in the weights and I don't think this is going to change.

Maybe I'm more sensitive to this than others, but I regularly deal with 10-100GB arrays.

No it's important to make sure this stuff works for large arrays. However, using xr.dot already gives quite a performance penalty, which I am not super happy about.

have you considered using these functions? [...]

None of your suggested functions support NaNs so they won't work.

I am all in to support more functions, but currently I am happy we got a weighted sum and mean into xarray after 5(!) years!

Further libraries that support weighted operations:

  • esmlab (xarray-based, supports NaN)
  • statsmodels (numpy-based, does not support NaN)
{
    "total_count": 1,
    "+1": 1,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  437765416
Powered by Datasette · Queries took 0.575ms · About: xarray-datasette