html_url,issue_url,id,node_id,user,created_at,updated_at,author_association,body,reactions,performed_via_github_app,issue
https://github.com/pydata/xarray/issues/1346#issuecomment-1119787557,https://api.github.com/repos/pydata/xarray/issues/1346,1119787557,IC_kwDOAMm_X85Cvpol,2448579,2022-05-06T16:22:32Z,2022-05-06T16:22:32Z,MEMBER,On second thought we should add this to a FAQ page.,"{""total_count"": 2, ""+1"": 2, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,218459353
https://github.com/pydata/xarray/issues/1346#issuecomment-1119786892,https://api.github.com/repos/pydata/xarray/issues/1346,1119786892,IC_kwDOAMm_X85CvpeM,2448579,2022-05-06T16:21:42Z,2022-05-06T16:21:42Z,MEMBER,Yes that sounds right. Thanks!,"{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,218459353
https://github.com/pydata/xarray/issues/1346#issuecomment-1119770101,https://api.github.com/repos/pydata/xarray/issues/1346,1119770101,IC_kwDOAMm_X85CvlX1,13301940,2022-05-06T16:01:44Z,2022-05-06T16:01:44Z,MEMBER,"- https://github.com/pydata/xarray/pull/5560 introduced ""use_bottleneck"" option to disable/enable using bottleneck. can we close this issue or keep it open?","{""total_count"": 1, ""+1"": 1, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,218459353
https://github.com/pydata/xarray/issues/1346#issuecomment-464338041,https://api.github.com/repos/pydata/xarray/issues/1346,464338041,MDEyOklzc3VlQ29tbWVudDQ2NDMzODA0MQ==,691772,2019-02-16T11:20:20Z,2019-02-16T11:20:20Z,CONTRIBUTOR,"Oh yes, of course! I've underestimated the low precision of float32 values above 2**24. Thanks for the hint.","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,218459353
https://github.com/pydata/xarray/issues/1346#issuecomment-463324373,https://api.github.com/repos/pydata/xarray/issues/1346,463324373,MDEyOklzc3VlQ29tbWVudDQ2MzMyNDM3Mw==,691772,2019-02-13T19:02:52Z,2019-02-16T10:53:51Z,CONTRIBUTOR,"I think (!) xarray is not effected any longer, but pandas is. Bisecting the GIT history leads to commit 0b9ab2d1, which means that xarray >= v0.10.9 should not be affected. Uninstalling bottleneck is also a valid workaround.
Bottleneck's documentation explicitly mentions that [no error is raised in case of an overflow](https://kwgoodman.github.io/bottleneck-doc/reference.html?highlight=overflow#bottleneck.nanmean). But it seams to be very evil behavior, so it might be worth reporting upstream. What do you think? (I think kwgoodman/bottleneck#164 is something different, isn't it?)
**Edit:** this is not an overflow. It's a numerical error by not applying [pairwise summation](https://en.wikipedia.org/wiki/Pairwise_summation).
A couple of minimal examples:
```python
>>> import numpy as np
>>> import pandas as pd
>>> import xarray as xr
>>> import bottleneck as bn
>>> bn.nanmean(np.ones(2**25, dtype=np.float32))
0.5
>>> pd.Series(np.ones(2**25, dtype=np.float32)).mean()
0.5
>>> xr.DataArray(np.ones(2**25, dtype=np.float32)).mean() # not affected for this version
array(1., dtype=float32)
```
Done with the following versions:
```bash
$ pip3 freeze
Bottleneck==1.2.1
numpy==1.16.1
pandas==0.24.1
xarray==0.11.3
...
```","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,218459353
https://github.com/pydata/xarray/issues/1346#issuecomment-464115604,https://api.github.com/repos/pydata/xarray/issues/1346,464115604,MDEyOklzc3VlQ29tbWVudDQ2NDExNTYwNA==,1217238,2019-02-15T16:39:08Z,2019-02-15T16:39:08Z,MEMBER,"The difference is that Bottleneck does the sum in the naive way, whereas NumPy uses the more numerically stable [pairwise summation](https://en.wikipedia.org/wiki/Pairwise_summation).","{""total_count"": 1, ""+1"": 1, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,218459353
https://github.com/pydata/xarray/issues/1346#issuecomment-464016154,https://api.github.com/repos/pydata/xarray/issues/1346,464016154,MDEyOklzc3VlQ29tbWVudDQ2NDAxNjE1NA==,691772,2019-02-15T11:41:36Z,2019-02-15T11:41:36Z,CONTRIBUTOR,"Oh hm, I think I didn't really understand what happens in `bottleneck.nanmean()`. I understand that integers can overflow and that float32 have varying absolute precision. The max float32 3.4E+38 is not hit here. So how can the mean of a list of ones be 0.5?
Isn't this what bottleneck is doing? Summing up a bunch of float32 values and then dividing by the length?
```
>>> d = np.ones(2**25, dtype=np.float32)
>>> d.sum()/np.float32(len(d))
1.0
```","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,218459353
https://github.com/pydata/xarray/issues/1346#issuecomment-464002579,https://api.github.com/repos/pydata/xarray/issues/1346,464002579,MDEyOklzc3VlQ29tbWVudDQ2NDAwMjU3OQ==,5469,2019-02-15T11:06:06Z,2019-02-15T11:06:06Z,NONE,"Ah ok, I suppose bottleneck is indeed now avoided for float32 xarray. Yeah that issue is for a different function, but the source of the problem and proposed solution in the thread is the same - use higher precision intermediates for float32 (double arithmetic); a small speed vs accuracy/precision trade off.","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,218459353
https://github.com/pydata/xarray/issues/1346#issuecomment-458427512,https://api.github.com/repos/pydata/xarray/issues/1346,458427512,MDEyOklzc3VlQ29tbWVudDQ1ODQyNzUxMg==,5469,2019-01-29T06:52:01Z,2019-01-29T06:52:01Z,NONE,"Is it worth changing bottleneck to use double for single precision reductions? AFAICT this is a matter of changing `npy_DTYPE0` to double in the `float{64,32}` versions of functions in `reduce_template.c`.","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,218459353
https://github.com/pydata/xarray/issues/1346#issuecomment-456173428,https://api.github.com/repos/pydata/xarray/issues/1346,456173428,MDEyOklzc3VlQ29tbWVudDQ1NjE3MzQyOA==,1217238,2019-01-21T19:09:43Z,2019-01-21T19:09:43Z,MEMBER,"> Would it be worth adding a warning (until the right solution is found) if someone is doing `.mean()` on a `DataArray` which is `float32`?
I would rather pick option (1) above, that is, ""Stop using bottleneck on float32 arrays""","{""total_count"": 3, ""+1"": 3, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,218459353
https://github.com/pydata/xarray/issues/1346#issuecomment-456149964,https://api.github.com/repos/pydata/xarray/issues/1346,456149964,MDEyOklzc3VlQ29tbWVudDQ1NjE0OTk2NA==,2405019,2019-01-21T17:33:31Z,2019-01-21T17:33:31Z,CONTRIBUTOR,"Sorry to unearth this issue again, but I just got bitten by this quite badly. I'm looking at absolute temperature perturbations and bottleneck's implementation together with my data being loaded as `float32` (correctly, as it's stored like that) causes an error on the size of the perturbations I'm looking for.
Example:
```
In [1]: import numpy as np
...: import bottleneck
In [2]: a = 300*np.ones((800**2,), dtype=np.float32)
In [3]: np.mean(a)
Out[3]: 300.0
In [4]: bottleneck.nanmean(a)
Out[4]: 302.6018981933594
```
Would it be worth adding a warning (until the right solution is found) if someone is doing `.mean()` on a `DataArray` which is `float32`?
Based a little experimentation (https://gist.github.com/leifdenby/8e874d3440a1ac96f96465a418f158ab) bottleneck's mean function builds up significant errors even with moderately sized arrays if they are `float32`, so I'm going to stop using `.mean()` as-is from now and always pass in `dtype=np.float64`.","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,218459353
https://github.com/pydata/xarray/issues/1346#issuecomment-290851733,https://api.github.com/repos/pydata/xarray/issues/1346,290851733,MDEyOklzc3VlQ29tbWVudDI5MDg1MTczMw==,1217238,2017-03-31T22:55:18Z,2017-03-31T22:55:18Z,MEMBER,"@matteodefelice you didn't decide on float32, but your data is stored that way. It's really hard to make choices about numerical precision for computations automatically: if we converted automatically to float64, somebody else would be complaining about unexpected memory usage :).
Looking at our options, we could:
1. Stop using bottleneck on float32 arrays, or provide a flag or option to disable using bottleneck. This is not ideal, because bottleneck is much faster.
2. Automatically convert float32 arrays to float64 before doing aggregations. This is not ideal, because it could significant increase memory requirements.
3. Add a `dtype` option for aggregations (like NumPy) and consider defaulting to `dype=np.float64` when doing aggregations on float32 arrays. I would generally be happy with this, but bottleneck currently doesn't provide the option currently.
4. Write a higher precision algorithm for bottleneck's `mean`.","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,218459353
https://github.com/pydata/xarray/issues/1346#issuecomment-290822179,https://api.github.com/repos/pydata/xarray/issues/1346,290822179,MDEyOklzc3VlQ29tbWVudDI5MDgyMjE3OQ==,6360066,2017-03-31T20:31:56Z,2017-03-31T20:31:56Z,NONE,"Thanks all guys for the replies.
@Aegaeon I get the same your results with bottleneck...
@shoyer The point is that I haven't decided the use of float32 and — yes — using `.astype(np.float64)` solves the issue...the point is that is not an expected behaviour, with such standard dataset I would not expect any problem related to numerical precision...","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,218459353
https://github.com/pydata/xarray/issues/1346#issuecomment-290760342,https://api.github.com/repos/pydata/xarray/issues/1346,290760342,MDEyOklzc3VlQ29tbWVudDI5MDc2MDM0Mg==,1217238,2017-03-31T16:24:04Z,2017-03-31T16:24:04Z,MEMBER,"Yes, this is probably related to the fact that `.mean()` in xarray uses bottleneck if available, and bottleneck has a slightly different mean implementation, quite possibly with a less numerically stable algorithm.
The fact that the dtype is float32 is a sign that this is probably a numerical precision issue. Try casting with `.astype(np.float64)` and see if the problem goes away.
If you really cared about performance using float32, the other thing to do to improve conditioning is to subtract and add a number close to the mean, e.g., `(ds.var167 - 270).mean() + 270`.","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,218459353
https://github.com/pydata/xarray/issues/1346#issuecomment-290755867,https://api.github.com/repos/pydata/xarray/issues/1346,290755867,MDEyOklzc3VlQ29tbWVudDI5MDc1NTg2Nw==,5852283,2017-03-31T16:07:56Z,2017-03-31T16:07:56Z,CONTRIBUTOR,"I think this might be a problem with bottleneck? My interpretation of _create_nan_agg_method in xarray/core/ops.py is that it may use bottleneck to get the mean unless you pass skipna=False or specify multiple axes. And,
```python
In [2]: import bottleneck
In [3]: bottleneck.__version__
Out[3]: '1.2.0'
In [6]: bottleneck.nanmean(ds.var167.data)
Out[6]: 261.6441345214844
```
Forgive me if I'm wrong, I'm still a bit new.","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,218459353
https://github.com/pydata/xarray/issues/1346#issuecomment-290754443,https://api.github.com/repos/pydata/xarray/issues/1346,290754443,MDEyOklzc3VlQ29tbWVudDI5MDc1NDQ0Mw==,10050469,2017-03-31T16:02:53Z,2017-03-31T16:02:53Z,MEMBER,Does it make a difference if you load the data first? (``ds.var167.load().mean()``) Or use python 3?,"{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,218459353
https://github.com/pydata/xarray/issues/1346#issuecomment-290747253,https://api.github.com/repos/pydata/xarray/issues/1346,290747253,MDEyOklzc3VlQ29tbWVudDI5MDc0NzI1Mw==,5852283,2017-03-31T15:38:12Z,2017-03-31T15:53:07Z,CONTRIBUTOR,"Also on macOS, and I can reproduce.
Using python 2.7.11, xarray 0.9.1, dask 0.14.1 installed through Anaconda. I get the same results with xarray 0.9.1-38-gc0178b7 from GitHub.
```python
In [3]: ds = xarray.open_dataset('ERAIN-t2m-1983-2012.seasmean.nc')
In [4]: ds.var167.mean()
Out[4]:
array(261.6441345214844, dtype=float32)
```
Curiously, I get the right results with skipna=False...
```python
In [10]: ds.var167.mean(skipna=False)
Out[10]:
array(278.6246643066406, dtype=float32)
```
... or by specifying coordinates to average over:
```python
In [5]: ds.var167.mean(('time', 'lat', 'lon'))
Out[5]:
array(278.6246643066406, dtype=float32)
```","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,218459353
https://github.com/pydata/xarray/issues/1346#issuecomment-290692479,https://api.github.com/repos/pydata/xarray/issues/1346,290692479,MDEyOklzc3VlQ29tbWVudDI5MDY5MjQ3OQ==,6360066,2017-03-31T11:53:12Z,2017-03-31T11:53:12Z,NONE,"Ok, I am on MacOS:
- Python 2.7.13 from Macports
- Dask 0.14.1 from Macports
- xarray from GitHub ","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,218459353
https://github.com/pydata/xarray/issues/1346#issuecomment-290691941,https://api.github.com/repos/pydata/xarray/issues/1346,290691941,MDEyOklzc3VlQ29tbWVudDI5MDY5MTk0MQ==,10050469,2017-03-31T11:50:05Z,2017-03-31T11:50:05Z,MEMBER,"I can't reproduce this:
```python
In [6]: ds = xr.open_dataset('./Downloads/ERAIN-t2m-1983-2012.seasmean.nc')
In [7]: ds.var167.mean()
Out[7]:
array(278.6246643066406, dtype=float32)
In [8]: ds.var167.data.mean()
Out[8]: 278.62466
```
which version of xarray, dask, python are you using?","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,218459353