home / github

Menu
  • GraphQL API
  • Search all tables

issue_comments

Table actions
  • GraphQL API for issue_comments

3 rows where issue = 218459353 and user = 691772 sorted by updated_at descending

✎ View and edit SQL

This data as json, CSV (advanced)

Suggested facets: created_at (date), updated_at (date)

user 1

  • lumbric · 3 ✖

issue 1

  • bottleneck : Wrong mean for float32 array · 3 ✖

author_association 1

  • CONTRIBUTOR 3
id html_url issue_url node_id user created_at updated_at ▲ author_association body reactions performed_via_github_app issue
464338041 https://github.com/pydata/xarray/issues/1346#issuecomment-464338041 https://api.github.com/repos/pydata/xarray/issues/1346 MDEyOklzc3VlQ29tbWVudDQ2NDMzODA0MQ== lumbric 691772 2019-02-16T11:20:20Z 2019-02-16T11:20:20Z CONTRIBUTOR

Oh yes, of course! I've underestimated the low precision of float32 values above 2**24. Thanks for the hint.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  bottleneck : Wrong mean for float32 array 218459353
463324373 https://github.com/pydata/xarray/issues/1346#issuecomment-463324373 https://api.github.com/repos/pydata/xarray/issues/1346 MDEyOklzc3VlQ29tbWVudDQ2MzMyNDM3Mw== lumbric 691772 2019-02-13T19:02:52Z 2019-02-16T10:53:51Z CONTRIBUTOR

I think (!) xarray is not effected any longer, but pandas is. Bisecting the GIT history leads to commit 0b9ab2d1, which means that xarray >= v0.10.9 should not be affected. Uninstalling bottleneck is also a valid workaround.

<s>Bottleneck's documentation explicitly mentions that no error is raised in case of an overflow. But it seams to be very evil behavior, so it might be worth reporting upstream.</s> What do you think? (I think kwgoodman/bottleneck#164 is something different, isn't it?) Edit: this is not an overflow. It's a numerical error by not applying pairwise summation.

A couple of minimal examples:

```python

import numpy as np import pandas as pd import xarray as xr import bottleneck as bn bn.nanmean(np.ones(225, dtype=np.float32))
0.5 pd.Series(np.ones(2
25, dtype=np.float32)).mean()
0.5 xr.DataArray(np.ones(2**25, dtype=np.float32)).mean() # not affected for this version <xarray.DataArray ()> array(1., dtype=float32) ```

Done with the following versions: bash $ pip3 freeze Bottleneck==1.2.1 numpy==1.16.1 pandas==0.24.1 xarray==0.11.3 ...

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  bottleneck : Wrong mean for float32 array 218459353
464016154 https://github.com/pydata/xarray/issues/1346#issuecomment-464016154 https://api.github.com/repos/pydata/xarray/issues/1346 MDEyOklzc3VlQ29tbWVudDQ2NDAxNjE1NA== lumbric 691772 2019-02-15T11:41:36Z 2019-02-15T11:41:36Z CONTRIBUTOR

Oh hm, I think I didn't really understand what happens in bottleneck.nanmean(). I understand that integers can overflow and that float32 have varying absolute precision. The max float32 3.4E+38 is not hit here. So how can the mean of a list of ones be 0.5?

Isn't this what bottleneck is doing? Summing up a bunch of float32 values and then dividing by the length?

```

d = np.ones(2**25, dtype=np.float32) d.sum()/np.float32(len(d)) 1.0 ```

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  bottleneck : Wrong mean for float32 array 218459353

Advanced export

JSON shape: default, array, newline-delimited, object

CSV options:

CREATE TABLE [issue_comments] (
   [html_url] TEXT,
   [issue_url] TEXT,
   [id] INTEGER PRIMARY KEY,
   [node_id] TEXT,
   [user] INTEGER REFERENCES [users]([id]),
   [created_at] TEXT,
   [updated_at] TEXT,
   [author_association] TEXT,
   [body] TEXT,
   [reactions] TEXT,
   [performed_via_github_app] TEXT,
   [issue] INTEGER REFERENCES [issues]([id])
);
CREATE INDEX [idx_issue_comments_issue]
    ON [issue_comments] ([issue]);
CREATE INDEX [idx_issue_comments_user]
    ON [issue_comments] ([user]);
Powered by Datasette · Queries took 158.84ms · About: xarray-datasette