home / github

Menu
  • GraphQL API
  • Search all tables

issue_comments

Table actions
  • GraphQL API for issue_comments

2 rows where issue = 462424005 sorted by updated_at descending

✎ View and edit SQL

This data as json, CSV (advanced)

Suggested facets: created_at (date), updated_at (date)

user 2

  • shoyer 1
  • mrezak 1

author_association 2

  • MEMBER 1
  • NONE 1

issue 1

  • xarray rolling does not match pandas when using min_periods and reduce · 2 ✖
id html_url issue_url node_id user created_at updated_at ▲ author_association body reactions performed_via_github_app issue
508295570 https://github.com/pydata/xarray/issues/3066#issuecomment-508295570 https://api.github.com/repos/pydata/xarray/issues/3066 MDEyOklzc3VlQ29tbWVudDUwODI5NTU3MA== mrezak 4903456 2019-07-04T00:23:45Z 2019-07-04T00:23:45Z NONE

@shoyer thanks for looking into this.

I also figured it later that I can just use np.nanmean (or nanmedian) but that function turns out to be much slower than np.mean (or np.median) version. As nans are only happening as the beginning and end of the sequence, is there any efficient way of using nanmean only for those segments and mean for the rest of the processing? My own thought is to have a check for nan in the custom function and apply mean or nanmean depending on the results of that check, but not sure if this can be done more efficiently.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  xarray rolling does not match pandas when using min_periods and reduce 462424005
508158344 https://github.com/pydata/xarray/issues/3066#issuecomment-508158344 https://api.github.com/repos/pydata/xarray/issues/3066 MDEyOklzc3VlQ29tbWVudDUwODE1ODM0NA== shoyer 1217238 2019-07-03T16:10:39Z 2019-07-03T16:11:17Z MEMBER

@mrezak Thanks for the report and the clear example!

Certainly this is an annoying inconsistency. I'm trying to figure out whether this is also a bug or not.

I think the difference comes down to how pandas and xarray pass data into the custom function. Pandas passes individual slices, trimming out values outside the window. Xarray passes an N+1 dimensional view of the array data with extra dimension added for the "window offset", with values outside the window filled with NaN: ```python import numpy as np import pandas as pd import xarray

def custom(x, axis=0): print(x) return np.mean(x, axis)

print('pandas example') d = pd.DataFrame(np.random.rand(11,3)) r = d.rolling(10, min_periods=5).apply(custom) print(r.iloc[0:10,:])

print('\nxarray example') xd = d.to_xarray().to_array() r = xd.rolling(index=10, min_periods=5).reduce(custom) print(r[:,0:10]) ``` Output:

``` pandas example [0.06130714 0.86751339 0.06688379 0.45866121 0.88848511] [0.06130714 0.86751339 0.06688379 0.45866121 0.88848511 0.22369799] [0.06130714 0.86751339 0.06688379 0.45866121 0.88848511 0.22369799 0.23970828] [0.06130714 0.86751339 0.06688379 0.45866121 0.88848511 0.22369799 0.23970828 0.94317625] [0.06130714 0.86751339 0.06688379 0.45866121 0.88848511 0.22369799 0.23970828 0.94317625 0.22736209] [0.06130714 0.86751339 0.06688379 0.45866121 0.88848511 0.22369799 0.23970828 0.94317625 0.22736209 0.08384912] [0.86751339 0.06688379 0.45866121 0.88848511 0.22369799 0.23970828 0.94317625 0.22736209 0.08384912 0.23068875] [0.87929068 0.81303738 0.62778023 0.34381748 0.55361603] [0.87929068 0.81303738 0.62778023 0.34381748 0.55361603 0.39705802] [0.87929068 0.81303738 0.62778023 0.34381748 0.55361603 0.39705802 0.2023665 ] [0.87929068 0.81303738 0.62778023 0.34381748 0.55361603 0.39705802 0.2023665 0.20541754] [0.87929068 0.81303738 0.62778023 0.34381748 0.55361603 0.39705802 0.2023665 0.20541754 0.37710566] [0.87929068 0.81303738 0.62778023 0.34381748 0.55361603 0.39705802 0.2023665 0.20541754 0.37710566 0.18844817] [0.81303738 0.62778023 0.34381748 0.55361603 0.39705802 0.2023665 0.20541754 0.37710566 0.18844817 0.51895952] [0.33501081 0.67972562 0.08622488 0.89673242 0.94532091] [0.33501081 0.67972562 0.08622488 0.89673242 0.94532091 0.84144888] [0.33501081 0.67972562 0.08622488 0.89673242 0.94532091 0.84144888 0.43766841] [0.33501081 0.67972562 0.08622488 0.89673242 0.94532091 0.84144888 0.43766841 0.88536995] [0.33501081 0.67972562 0.08622488 0.89673242 0.94532091 0.84144888 0.43766841 0.88536995 0.7662462 ] [0.33501081 0.67972562 0.08622488 0.89673242 0.94532091 0.84144888 0.43766841 0.88536995 0.7662462 0.4677236 ] [0.67972562 0.08622488 0.89673242 0.94532091 0.84144888 0.43766841 0.88536995 0.7662462 0.4677236 0.7083373 ] 0 1 2 0 NaN NaN NaN 1 NaN NaN NaN 2 NaN NaN NaN 3 NaN NaN NaN 4 0.468570 0.643508 0.588603 5 0.427758 0.602433 0.630744 6 0.400894 0.545281 0.603162 7 0.468679 0.502798 0.638438 8 0.441866 0.488832 0.652639 9 0.406064 0.458794 0.634147 xarray example [[[ nan nan nan nan nan nan nan nan nan 0.06130714] [ nan nan nan nan nan nan nan nan 0.06130714 0.86751339] [ nan nan nan nan nan nan nan 0.06130714 0.86751339 0.06688379] [ nan nan nan nan nan nan 0.06130714 0.86751339 0.06688379 0.45866121] [ nan nan nan nan nan 0.06130714 0.86751339 0.06688379 0.45866121 0.88848511] [ nan nan nan nan 0.06130714 0.86751339 0.06688379 0.45866121 0.88848511 0.22369799] [ nan nan nan 0.06130714 0.86751339 0.06688379 0.45866121 0.88848511 0.22369799 0.23970828] [ nan nan 0.06130714 0.86751339 0.06688379 0.45866121 0.88848511 0.22369799 0.23970828 0.94317625] [ nan 0.06130714 0.86751339 0.06688379 0.45866121 0.88848511 0.22369799 0.23970828 0.94317625 0.22736209] [0.06130714 0.86751339 0.06688379 0.45866121 0.88848511 0.22369799 0.23970828 0.94317625 0.22736209 0.08384912] [0.86751339 0.06688379 0.45866121 0.88848511 0.22369799 0.23970828 0.94317625 0.22736209 0.08384912 0.23068875]] [[ nan nan nan nan nan nan nan nan nan 0.87929068] [ nan nan nan nan nan nan nan nan 0.87929068 0.81303738] [ nan nan nan nan nan nan nan 0.87929068 0.81303738 0.62778023] [ nan nan nan nan nan nan 0.87929068 0.81303738 0.62778023 0.34381748] [ nan nan nan nan nan 0.87929068 0.81303738 0.62778023 0.34381748 0.55361603] [ nan nan nan nan 0.87929068 0.81303738 0.62778023 0.34381748 0.55361603 0.39705802] [ nan nan nan 0.87929068 0.81303738 0.62778023 0.34381748 0.55361603 0.39705802 0.2023665 ] [ nan nan 0.87929068 0.81303738 0.62778023 0.34381748 0.55361603 0.39705802 0.2023665 0.20541754] [ nan 0.87929068 0.81303738 0.62778023 0.34381748 0.55361603 0.39705802 0.2023665 0.20541754 0.37710566] [0.87929068 0.81303738 0.62778023 0.34381748 0.55361603 0.39705802 0.2023665 0.20541754 0.37710566 0.18844817] [0.81303738 0.62778023 0.34381748 0.55361603 0.39705802 0.2023665 0.20541754 0.37710566 0.18844817 0.51895952]] [[ nan nan nan nan nan nan nan nan nan 0.33501081] [ nan nan nan nan nan nan nan nan 0.33501081 0.67972562] [ nan nan nan nan nan nan nan 0.33501081 0.67972562 0.08622488] [ nan nan nan nan nan nan 0.33501081 0.67972562 0.08622488 0.89673242] [ nan nan nan nan nan 0.33501081 0.67972562 0.08622488 0.89673242 0.94532091] [ nan nan nan nan 0.33501081 0.67972562 0.08622488 0.89673242 0.94532091 0.84144888] [ nan nan nan 0.33501081 0.67972562 0.08622488 0.89673242 0.94532091 0.84144888 0.43766841] [ nan nan 0.33501081 0.67972562 0.08622488 0.89673242 0.94532091 0.84144888 0.43766841 0.88536995] [ nan 0.33501081 0.67972562 0.08622488 0.89673242 0.94532091 0.84144888 0.43766841 0.88536995 0.7662462 ] [0.33501081 0.67972562 0.08622488 0.89673242 0.94532091 0.84144888 0.43766841 0.88536995 0.7662462 0.4677236 ] [0.67972562 0.08622488 0.89673242 0.94532091 0.84144888 0.43766841 0.88536995 0.7662462 0.4677236 0.7083373 ]]] <xarray.DataArray (variable: 3, index: 10)> array([[ nan, nan, nan, nan, nan, nan, nan, nan, nan, 0.406064], [ nan, nan, nan, nan, nan, nan, nan, nan, nan, 0.458794], [ nan, nan, nan, nan, nan, nan, nan, nan, nan, 0.634147]]) Coordinates: * index (index) int64 0 1 2 3 4 5 6 7 8 9 * variable (variable) int64 0 1 2 ```

Xarray's version is certainly going to be way faster, but it has the downside of treating windows differently. One way to work around this would be to use np.nanmean inside custom instead of np.mean.

cc @jhamman @fujiisoup who worked on this and may have ideas

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  xarray rolling does not match pandas when using min_periods and reduce 462424005

Advanced export

JSON shape: default, array, newline-delimited, object

CSV options:

CREATE TABLE [issue_comments] (
   [html_url] TEXT,
   [issue_url] TEXT,
   [id] INTEGER PRIMARY KEY,
   [node_id] TEXT,
   [user] INTEGER REFERENCES [users]([id]),
   [created_at] TEXT,
   [updated_at] TEXT,
   [author_association] TEXT,
   [body] TEXT,
   [reactions] TEXT,
   [performed_via_github_app] TEXT,
   [issue] INTEGER REFERENCES [issues]([id])
);
CREATE INDEX [idx_issue_comments_issue]
    ON [issue_comments] ([issue]);
CREATE INDEX [idx_issue_comments_user]
    ON [issue_comments] ([user]);
Powered by Datasette · Queries took 19.552ms · About: xarray-datasette