home / github / issue_comments

Menu
  • Search all tables
  • GraphQL API

issue_comments: 397090519

This data as json

html_url issue_url id node_id user created_at updated_at author_association body reactions performed_via_github_app issue
https://github.com/pydata/xarray/issues/2230#issuecomment-397090519 https://api.github.com/repos/pydata/xarray/issues/2230 397090519 MDEyOklzc3VlQ29tbWVudDM5NzA5MDUxOQ== 1217238 2018-06-13T21:19:55Z 2018-06-13T21:19:55Z MEMBER

The difference between mean and sum here isn't resample specific. Xarray consistently interprets a "NA skipping sum" consistently as returning 0 in the case of all NaN inputs: ```

float(xarray.DataArray([np.nan]).sum()) 0.0 This is consistent with the sum of an empty set being 0, e.g., float(xarray.DataArray([]).sum()) 0.0 ```

The reason why a "NA skipping mean" is different in the case of all NaN inputs is that the mean simply isn't well defined on an empty set. The mean would literally be a sum of zero divided by a count of zero, which is not a valid number: the literal meaning of NaN as "not a number".

There was a long discussion/debate about this recently in pandas. See https://github.com/pandas-dev/pandas/issues/18678 and links there-in. There are certainly use-cases where it is nicer for the sum of all NaN outputs to be NaN (exactly as you mention here), but ultimately pandas decided that the answer for this operation should be zero. The decisive considerations were simplicity and consistency with other tools (including NumPy and R).

What pandas added to solve this use-case is an optional min_count argument (see pandas.DataFrame.sum for an example). We could definitely copy this behavior in xarray if someone is interested in implementing it.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  331981984
Powered by Datasette · Queries took 0.657ms · About: xarray-datasette