html_url,issue_url,id,node_id,user,created_at,updated_at,author_association,body,reactions,performed_via_github_app,issue
https://github.com/pydata/xarray/issues/1346#issuecomment-464338041,https://api.github.com/repos/pydata/xarray/issues/1346,464338041,MDEyOklzc3VlQ29tbWVudDQ2NDMzODA0MQ==,691772,2019-02-16T11:20:20Z,2019-02-16T11:20:20Z,CONTRIBUTOR,"Oh yes, of course! I've underestimated the low precision of float32 values above 2**24. Thanks for the hint.","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,218459353
https://github.com/pydata/xarray/issues/1346#issuecomment-463324373,https://api.github.com/repos/pydata/xarray/issues/1346,463324373,MDEyOklzc3VlQ29tbWVudDQ2MzMyNDM3Mw==,691772,2019-02-13T19:02:52Z,2019-02-16T10:53:51Z,CONTRIBUTOR,"I think (!) xarray is not effected any longer, but pandas is. Bisecting the GIT history leads to commit 0b9ab2d1, which means that xarray >= v0.10.9 should not be affected. Uninstalling bottleneck is also a valid workaround.
Bottleneck's documentation explicitly mentions that [no error is raised in case of an overflow](https://kwgoodman.github.io/bottleneck-doc/reference.html?highlight=overflow#bottleneck.nanmean). But it seams to be very evil behavior, so it might be worth reporting upstream. What do you think? (I think kwgoodman/bottleneck#164 is something different, isn't it?)
**Edit:** this is not an overflow. It's a numerical error by not applying [pairwise summation](https://en.wikipedia.org/wiki/Pairwise_summation).
A couple of minimal examples:
```python
>>> import numpy as np
>>> import pandas as pd
>>> import xarray as xr
>>> import bottleneck as bn
>>> bn.nanmean(np.ones(2**25, dtype=np.float32))
0.5
>>> pd.Series(np.ones(2**25, dtype=np.float32)).mean()
0.5
>>> xr.DataArray(np.ones(2**25, dtype=np.float32)).mean() # not affected for this version
array(1., dtype=float32)
```
Done with the following versions:
```bash
$ pip3 freeze
Bottleneck==1.2.1
numpy==1.16.1
pandas==0.24.1
xarray==0.11.3
...
```","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,218459353
https://github.com/pydata/xarray/issues/1346#issuecomment-464016154,https://api.github.com/repos/pydata/xarray/issues/1346,464016154,MDEyOklzc3VlQ29tbWVudDQ2NDAxNjE1NA==,691772,2019-02-15T11:41:36Z,2019-02-15T11:41:36Z,CONTRIBUTOR,"Oh hm, I think I didn't really understand what happens in `bottleneck.nanmean()`. I understand that integers can overflow and that float32 have varying absolute precision. The max float32 3.4E+38 is not hit here. So how can the mean of a list of ones be 0.5?
Isn't this what bottleneck is doing? Summing up a bunch of float32 values and then dividing by the length?
```
>>> d = np.ones(2**25, dtype=np.float32)
>>> d.sum()/np.float32(len(d))
1.0
```","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,218459353
https://github.com/pydata/xarray/issues/1346#issuecomment-456149964,https://api.github.com/repos/pydata/xarray/issues/1346,456149964,MDEyOklzc3VlQ29tbWVudDQ1NjE0OTk2NA==,2405019,2019-01-21T17:33:31Z,2019-01-21T17:33:31Z,CONTRIBUTOR,"Sorry to unearth this issue again, but I just got bitten by this quite badly. I'm looking at absolute temperature perturbations and bottleneck's implementation together with my data being loaded as `float32` (correctly, as it's stored like that) causes an error on the size of the perturbations I'm looking for.
Example:
```
In [1]: import numpy as np
...: import bottleneck
In [2]: a = 300*np.ones((800**2,), dtype=np.float32)
In [3]: np.mean(a)
Out[3]: 300.0
In [4]: bottleneck.nanmean(a)
Out[4]: 302.6018981933594
```
Would it be worth adding a warning (until the right solution is found) if someone is doing `.mean()` on a `DataArray` which is `float32`?
Based a little experimentation (https://gist.github.com/leifdenby/8e874d3440a1ac96f96465a418f158ab) bottleneck's mean function builds up significant errors even with moderately sized arrays if they are `float32`, so I'm going to stop using `.mean()` as-is from now and always pass in `dtype=np.float64`.","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,218459353
https://github.com/pydata/xarray/issues/1346#issuecomment-290755867,https://api.github.com/repos/pydata/xarray/issues/1346,290755867,MDEyOklzc3VlQ29tbWVudDI5MDc1NTg2Nw==,5852283,2017-03-31T16:07:56Z,2017-03-31T16:07:56Z,CONTRIBUTOR,"I think this might be a problem with bottleneck? My interpretation of _create_nan_agg_method in xarray/core/ops.py is that it may use bottleneck to get the mean unless you pass skipna=False or specify multiple axes. And,
```python
In [2]: import bottleneck
In [3]: bottleneck.__version__
Out[3]: '1.2.0'
In [6]: bottleneck.nanmean(ds.var167.data)
Out[6]: 261.6441345214844
```
Forgive me if I'm wrong, I'm still a bit new.","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,218459353
https://github.com/pydata/xarray/issues/1346#issuecomment-290747253,https://api.github.com/repos/pydata/xarray/issues/1346,290747253,MDEyOklzc3VlQ29tbWVudDI5MDc0NzI1Mw==,5852283,2017-03-31T15:38:12Z,2017-03-31T15:53:07Z,CONTRIBUTOR,"Also on macOS, and I can reproduce.
Using python 2.7.11, xarray 0.9.1, dask 0.14.1 installed through Anaconda. I get the same results with xarray 0.9.1-38-gc0178b7 from GitHub.
```python
In [3]: ds = xarray.open_dataset('ERAIN-t2m-1983-2012.seasmean.nc')
In [4]: ds.var167.mean()
Out[4]:
array(261.6441345214844, dtype=float32)
```
Curiously, I get the right results with skipna=False...
```python
In [10]: ds.var167.mean(skipna=False)
Out[10]:
array(278.6246643066406, dtype=float32)
```
... or by specifying coordinates to average over:
```python
In [5]: ds.var167.mean(('time', 'lat', 'lon'))
Out[5]:
array(278.6246643066406, dtype=float32)
```","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,218459353