issue_comments
19 rows where issue = 218459353 sorted by updated_at descending
This data as json, CSV (advanced)
Suggested facets: reactions, created_at (date), updated_at (date)
issue 1
- bottleneck : Wrong mean for float32 array · 19 ✖
id | html_url | issue_url | node_id | user | created_at | updated_at ▲ | author_association | body | reactions | performed_via_github_app | issue |
---|---|---|---|---|---|---|---|---|---|---|---|
1119787557 | https://github.com/pydata/xarray/issues/1346#issuecomment-1119787557 | https://api.github.com/repos/pydata/xarray/issues/1346 | IC_kwDOAMm_X85Cvpol | dcherian 2448579 | 2022-05-06T16:22:32Z | 2022-05-06T16:22:32Z | MEMBER | On second thought we should add this to a FAQ page. |
{ "total_count": 2, "+1": 2, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
bottleneck : Wrong mean for float32 array 218459353 | |
1119786892 | https://github.com/pydata/xarray/issues/1346#issuecomment-1119786892 | https://api.github.com/repos/pydata/xarray/issues/1346 | IC_kwDOAMm_X85CvpeM | dcherian 2448579 | 2022-05-06T16:21:42Z | 2022-05-06T16:21:42Z | MEMBER | Yes that sounds right. Thanks! |
{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
bottleneck : Wrong mean for float32 array 218459353 | |
1119770101 | https://github.com/pydata/xarray/issues/1346#issuecomment-1119770101 | https://api.github.com/repos/pydata/xarray/issues/1346 | IC_kwDOAMm_X85CvlX1 | andersy005 13301940 | 2022-05-06T16:01:44Z | 2022-05-06T16:01:44Z | MEMBER |
|
{ "total_count": 1, "+1": 1, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
bottleneck : Wrong mean for float32 array 218459353 | |
464338041 | https://github.com/pydata/xarray/issues/1346#issuecomment-464338041 | https://api.github.com/repos/pydata/xarray/issues/1346 | MDEyOklzc3VlQ29tbWVudDQ2NDMzODA0MQ== | lumbric 691772 | 2019-02-16T11:20:20Z | 2019-02-16T11:20:20Z | CONTRIBUTOR | Oh yes, of course! I've underestimated the low precision of float32 values above 2**24. Thanks for the hint. |
{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
bottleneck : Wrong mean for float32 array 218459353 | |
463324373 | https://github.com/pydata/xarray/issues/1346#issuecomment-463324373 | https://api.github.com/repos/pydata/xarray/issues/1346 | MDEyOklzc3VlQ29tbWVudDQ2MzMyNDM3Mw== | lumbric 691772 | 2019-02-13T19:02:52Z | 2019-02-16T10:53:51Z | CONTRIBUTOR | I think (!) xarray is not effected any longer, but pandas is. Bisecting the GIT history leads to commit 0b9ab2d1, which means that xarray >= v0.10.9 should not be affected. Uninstalling bottleneck is also a valid workaround. <s>Bottleneck's documentation explicitly mentions that no error is raised in case of an overflow. But it seams to be very evil behavior, so it might be worth reporting upstream.</s> What do you think? (I think kwgoodman/bottleneck#164 is something different, isn't it?) Edit: this is not an overflow. It's a numerical error by not applying pairwise summation. A couple of minimal examples: ```python
Done with the following versions:
|
{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
bottleneck : Wrong mean for float32 array 218459353 | |
464115604 | https://github.com/pydata/xarray/issues/1346#issuecomment-464115604 | https://api.github.com/repos/pydata/xarray/issues/1346 | MDEyOklzc3VlQ29tbWVudDQ2NDExNTYwNA== | shoyer 1217238 | 2019-02-15T16:39:08Z | 2019-02-15T16:39:08Z | MEMBER | The difference is that Bottleneck does the sum in the naive way, whereas NumPy uses the more numerically stable pairwise summation. |
{ "total_count": 1, "+1": 1, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
bottleneck : Wrong mean for float32 array 218459353 | |
464016154 | https://github.com/pydata/xarray/issues/1346#issuecomment-464016154 | https://api.github.com/repos/pydata/xarray/issues/1346 | MDEyOklzc3VlQ29tbWVudDQ2NDAxNjE1NA== | lumbric 691772 | 2019-02-15T11:41:36Z | 2019-02-15T11:41:36Z | CONTRIBUTOR | Oh hm, I think I didn't really understand what happens in Isn't this what bottleneck is doing? Summing up a bunch of float32 values and then dividing by the length? ```
|
{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
bottleneck : Wrong mean for float32 array 218459353 | |
464002579 | https://github.com/pydata/xarray/issues/1346#issuecomment-464002579 | https://api.github.com/repos/pydata/xarray/issues/1346 | MDEyOklzc3VlQ29tbWVudDQ2NDAwMjU3OQ== | aquasync 5469 | 2019-02-15T11:06:06Z | 2019-02-15T11:06:06Z | NONE | Ah ok, I suppose bottleneck is indeed now avoided for float32 xarray. Yeah that issue is for a different function, but the source of the problem and proposed solution in the thread is the same - use higher precision intermediates for float32 (double arithmetic); a small speed vs accuracy/precision trade off. |
{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
bottleneck : Wrong mean for float32 array 218459353 | |
458427512 | https://github.com/pydata/xarray/issues/1346#issuecomment-458427512 | https://api.github.com/repos/pydata/xarray/issues/1346 | MDEyOklzc3VlQ29tbWVudDQ1ODQyNzUxMg== | aquasync 5469 | 2019-01-29T06:52:01Z | 2019-01-29T06:52:01Z | NONE | Is it worth changing bottleneck to use double for single precision reductions? AFAICT this is a matter of changing |
{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
bottleneck : Wrong mean for float32 array 218459353 | |
456173428 | https://github.com/pydata/xarray/issues/1346#issuecomment-456173428 | https://api.github.com/repos/pydata/xarray/issues/1346 | MDEyOklzc3VlQ29tbWVudDQ1NjE3MzQyOA== | shoyer 1217238 | 2019-01-21T19:09:43Z | 2019-01-21T19:09:43Z | MEMBER |
I would rather pick option (1) above, that is, "Stop using bottleneck on float32 arrays" |
{ "total_count": 3, "+1": 3, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
bottleneck : Wrong mean for float32 array 218459353 | |
456149964 | https://github.com/pydata/xarray/issues/1346#issuecomment-456149964 | https://api.github.com/repos/pydata/xarray/issues/1346 | MDEyOklzc3VlQ29tbWVudDQ1NjE0OTk2NA== | leifdenby 2405019 | 2019-01-21T17:33:31Z | 2019-01-21T17:33:31Z | CONTRIBUTOR | Sorry to unearth this issue again, but I just got bitten by this quite badly. I'm looking at absolute temperature perturbations and bottleneck's implementation together with my data being loaded as Example: ``` In [1]: import numpy as np ...: import bottleneck In [2]: a = 300np.ones((800*2,), dtype=np.float32) In [3]: np.mean(a) Out[3]: 300.0 In [4]: bottleneck.nanmean(a) Out[4]: 302.6018981933594 ``` Would it be worth adding a warning (until the right solution is found) if someone is doing Based a little experimentation (https://gist.github.com/leifdenby/8e874d3440a1ac96f96465a418f158ab) bottleneck's mean function builds up significant errors even with moderately sized arrays if they are |
{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
bottleneck : Wrong mean for float32 array 218459353 | |
290851733 | https://github.com/pydata/xarray/issues/1346#issuecomment-290851733 | https://api.github.com/repos/pydata/xarray/issues/1346 | MDEyOklzc3VlQ29tbWVudDI5MDg1MTczMw== | shoyer 1217238 | 2017-03-31T22:55:18Z | 2017-03-31T22:55:18Z | MEMBER | @matteodefelice you didn't decide on float32, but your data is stored that way. It's really hard to make choices about numerical precision for computations automatically: if we converted automatically to float64, somebody else would be complaining about unexpected memory usage :). Looking at our options, we could:
|
{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
bottleneck : Wrong mean for float32 array 218459353 | |
290822179 | https://github.com/pydata/xarray/issues/1346#issuecomment-290822179 | https://api.github.com/repos/pydata/xarray/issues/1346 | MDEyOklzc3VlQ29tbWVudDI5MDgyMjE3OQ== | matteodefelice 6360066 | 2017-03-31T20:31:56Z | 2017-03-31T20:31:56Z | NONE | Thanks all guys for the replies.
@Aegaeon I get the same your results with bottleneck...
@shoyer The point is that I haven't decided the use of float32 and — yes — using |
{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
bottleneck : Wrong mean for float32 array 218459353 | |
290760342 | https://github.com/pydata/xarray/issues/1346#issuecomment-290760342 | https://api.github.com/repos/pydata/xarray/issues/1346 | MDEyOklzc3VlQ29tbWVudDI5MDc2MDM0Mg== | shoyer 1217238 | 2017-03-31T16:24:04Z | 2017-03-31T16:24:04Z | MEMBER | Yes, this is probably related to the fact that The fact that the dtype is float32 is a sign that this is probably a numerical precision issue. Try casting with If you really cared about performance using float32, the other thing to do to improve conditioning is to subtract and add a number close to the mean, e.g., |
{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
bottleneck : Wrong mean for float32 array 218459353 | |
290755867 | https://github.com/pydata/xarray/issues/1346#issuecomment-290755867 | https://api.github.com/repos/pydata/xarray/issues/1346 | MDEyOklzc3VlQ29tbWVudDI5MDc1NTg2Nw== | andrew-c-ross 5852283 | 2017-03-31T16:07:56Z | 2017-03-31T16:07:56Z | CONTRIBUTOR | I think this might be a problem with bottleneck? My interpretation of _create_nan_agg_method in xarray/core/ops.py is that it may use bottleneck to get the mean unless you pass skipna=False or specify multiple axes. And, ```python In [2]: import bottleneck In [3]: bottleneck.version Out[3]: '1.2.0' In [6]: bottleneck.nanmean(ds.var167.data) Out[6]: 261.6441345214844 ``` Forgive me if I'm wrong, I'm still a bit new. |
{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
bottleneck : Wrong mean for float32 array 218459353 | |
290754443 | https://github.com/pydata/xarray/issues/1346#issuecomment-290754443 | https://api.github.com/repos/pydata/xarray/issues/1346 | MDEyOklzc3VlQ29tbWVudDI5MDc1NDQ0Mw== | fmaussion 10050469 | 2017-03-31T16:02:53Z | 2017-03-31T16:02:53Z | MEMBER | Does it make a difference if you load the data first? ( |
{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
bottleneck : Wrong mean for float32 array 218459353 | |
290747253 | https://github.com/pydata/xarray/issues/1346#issuecomment-290747253 | https://api.github.com/repos/pydata/xarray/issues/1346 | MDEyOklzc3VlQ29tbWVudDI5MDc0NzI1Mw== | andrew-c-ross 5852283 | 2017-03-31T15:38:12Z | 2017-03-31T15:53:07Z | CONTRIBUTOR | Also on macOS, and I can reproduce. Using python 2.7.11, xarray 0.9.1, dask 0.14.1 installed through Anaconda. I get the same results with xarray 0.9.1-38-gc0178b7 from GitHub. ```python In [3]: ds = xarray.open_dataset('ERAIN-t2m-1983-2012.seasmean.nc') In [4]: ds.var167.mean() Out[4]: <xarray.DataArray 'var167' ()> array(261.6441345214844, dtype=float32) ``` Curiously, I get the right results with skipna=False...
... or by specifying coordinates to average over:
|
{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
bottleneck : Wrong mean for float32 array 218459353 | |
290692479 | https://github.com/pydata/xarray/issues/1346#issuecomment-290692479 | https://api.github.com/repos/pydata/xarray/issues/1346 | MDEyOklzc3VlQ29tbWVudDI5MDY5MjQ3OQ== | matteodefelice 6360066 | 2017-03-31T11:53:12Z | 2017-03-31T11:53:12Z | NONE | Ok, I am on MacOS: - Python 2.7.13 from Macports - Dask 0.14.1 from Macports - xarray from GitHub |
{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
bottleneck : Wrong mean for float32 array 218459353 | |
290691941 | https://github.com/pydata/xarray/issues/1346#issuecomment-290691941 | https://api.github.com/repos/pydata/xarray/issues/1346 | MDEyOklzc3VlQ29tbWVudDI5MDY5MTk0MQ== | fmaussion 10050469 | 2017-03-31T11:50:05Z | 2017-03-31T11:50:05Z | MEMBER | I can't reproduce this: ```python In [6]: ds = xr.open_dataset('./Downloads/ERAIN-t2m-1983-2012.seasmean.nc') In [7]: ds.var167.mean() Out[7]: <xarray.DataArray 'var167' ()> array(278.6246643066406, dtype=float32) In [8]: ds.var167.data.mean() Out[8]: 278.62466 ``` which version of xarray, dask, python are you using? |
{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
bottleneck : Wrong mean for float32 array 218459353 |
Advanced export
JSON shape: default, array, newline-delimited, object
CREATE TABLE [issue_comments] ( [html_url] TEXT, [issue_url] TEXT, [id] INTEGER PRIMARY KEY, [node_id] TEXT, [user] INTEGER REFERENCES [users]([id]), [created_at] TEXT, [updated_at] TEXT, [author_association] TEXT, [body] TEXT, [reactions] TEXT, [performed_via_github_app] TEXT, [issue] INTEGER REFERENCES [issues]([id]) ); CREATE INDEX [idx_issue_comments_issue] ON [issue_comments] ([issue]); CREATE INDEX [idx_issue_comments_user] ON [issue_comments] ([user]);
user 9