home / github

Menu
  • Search all tables
  • GraphQL API

issue_comments

Table actions
  • GraphQL API for issue_comments

4 rows where author_association = "MEMBER" and issue = 331981984 sorted by updated_at descending

✎ View and edit SQL

This data as json, CSV (advanced)

Suggested facets: created_at (date), updated_at (date)

user 2

  • shoyer 2
  • fujiisoup 2

issue 1

  • Inconsistency between Sum of NA's and Mean of NA's: resampling gives 0 or 'NA' · 4 ✖

author_association 1

  • MEMBER · 4 ✖
id html_url issue_url node_id user created_at updated_at ▲ author_association body reactions performed_via_github_app issue
398045641 https://github.com/pydata/xarray/issues/2230#issuecomment-398045641 https://api.github.com/repos/pydata/xarray/issues/2230 MDEyOklzc3VlQ29tbWVudDM5ODA0NTY0MQ== fujiisoup 6815844 2018-06-18T12:59:48Z 2018-06-18T12:59:48Z MEMBER

@rpnaut, thanks for lookng inside the code. See #2236.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Inconsistency between Sum of NA's and Mean of NA's: resampling gives 0 or 'NA' 331981984
397092870 https://github.com/pydata/xarray/issues/2230#issuecomment-397092870 https://api.github.com/repos/pydata/xarray/issues/2230 MDEyOklzc3VlQ29tbWVudDM5NzA5Mjg3MA== shoyer 1217238 2018-06-13T21:27:33Z 2018-06-13T21:27:33Z MEMBER

OK, I see you already saw the pandas issues :).

For earth science it would be nice to have an option telling xarray what to do in case of a sum over values being all NA. Do you see a chance to have a fast fix for that issue in the model code?

Yes, I would be very open to adding a min_count argument.

We could probably copy the implementation of sum with min_count largely from pandas: https://github.com/pandas-dev/pandas/blob/0c4e611927772af44b02204192b29282341a5716/pandas/core/nanops.py#L329

In xarray this would go into _create_nan_agg_method in https://github.com/pydata/xarray/blob/master/xarray/core/duck_array_ops.py (sorry, this has gotten a little messy!)

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Inconsistency between Sum of NA's and Mean of NA's: resampling gives 0 or 'NA' 331981984
397090519 https://github.com/pydata/xarray/issues/2230#issuecomment-397090519 https://api.github.com/repos/pydata/xarray/issues/2230 MDEyOklzc3VlQ29tbWVudDM5NzA5MDUxOQ== shoyer 1217238 2018-06-13T21:19:55Z 2018-06-13T21:19:55Z MEMBER

The difference between mean and sum here isn't resample specific. Xarray consistently interprets a "NA skipping sum" consistently as returning 0 in the case of all NaN inputs: ```

float(xarray.DataArray([np.nan]).sum()) 0.0 This is consistent with the sum of an empty set being 0, e.g., float(xarray.DataArray([]).sum()) 0.0 ```

The reason why a "NA skipping mean" is different in the case of all NaN inputs is that the mean simply isn't well defined on an empty set. The mean would literally be a sum of zero divided by a count of zero, which is not a valid number: the literal meaning of NaN as "not a number".

There was a long discussion/debate about this recently in pandas. See https://github.com/pandas-dev/pandas/issues/18678 and links there-in. There are certainly use-cases where it is nicer for the sum of all NaN outputs to be NaN (exactly as you mention here), but ultimately pandas decided that the answer for this operation should be zero. The decisive considerations were simplicity and consistency with other tools (including NumPy and R).

What pandas added to solve this use-case is an optional min_count argument (see pandas.DataFrame.sum for an example). We could definitely copy this behavior in xarray if someone is interested in implementing it.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Inconsistency between Sum of NA's and Mean of NA's: resampling gives 0 or 'NA' 331981984
396928537 https://github.com/pydata/xarray/issues/2230#issuecomment-396928537 https://api.github.com/repos/pydata/xarray/issues/2230 MDEyOklzc3VlQ29tbWVudDM5NjkyODUzNw== fujiisoup 6815844 2018-06-13T13:00:45Z 2018-06-13T13:01:13Z MEMBER

Thank you for raising an issue. Could you try using .sum(skipna=False) for resampled data?

As similar to pandas.DataFrame.sum, our .sum (and other reduction methods) assumes skipna=True unless explicitly specified.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Inconsistency between Sum of NA's and Mean of NA's: resampling gives 0 or 'NA' 331981984

Advanced export

JSON shape: default, array, newline-delimited, object

CSV options:

CREATE TABLE [issue_comments] (
   [html_url] TEXT,
   [issue_url] TEXT,
   [id] INTEGER PRIMARY KEY,
   [node_id] TEXT,
   [user] INTEGER REFERENCES [users]([id]),
   [created_at] TEXT,
   [updated_at] TEXT,
   [author_association] TEXT,
   [body] TEXT,
   [reactions] TEXT,
   [performed_via_github_app] TEXT,
   [issue] INTEGER REFERENCES [issues]([id])
);
CREATE INDEX [idx_issue_comments_issue]
    ON [issue_comments] ([issue]);
CREATE INDEX [idx_issue_comments_user]
    ON [issue_comments] ([user]);
Powered by Datasette · Queries took 14.277ms · About: xarray-datasette