issues: 2024104632
This data as json
id | node_id | number | title | user | state | locked | assignee | milestone | comments | created_at | updated_at | closed_at | author_association | active_lock_reason | draft | pull_request | body | reactions | performed_via_github_app | state_reason | repo | type |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
2024104632 | I_kwDOAMm_X854pWK4 | 8515 | Inconsistant behaviour of groupby_bins mean when using flox and numbagg | 21100296 | closed | 0 | 5 | 2023-12-04T15:17:51Z | 2023-12-05T08:21:44Z | 2023-12-04T18:57:30Z | NONE | What happened?When I group an xarray.DataArray in a single group and calculate the mean, then I expect the mean of this group to be the same as the mean of the input data. When I have flox and numbagg installed next to xarray, I get inconsistant behavoir. The behaviour is consistent again when setting the option "use_flox" to False. What did you expect to happen?I expected xarray to give the mean of the values in the group. I expected this mean to be the same with flox as without flox. More specifically, I expected it to be (almost) equal to the numpy.mean. Minimal Complete Verifiable Example```Python in a clean python.org environment:pip install xarray, numbagg, floximport numpy as np import xarray as xr def grouped_mean(number): # Generate a set of random values np.random.seed(0) values = np.random.rand(number) # Use numpy to calculated the expected mean expected = np.mean(values) # Create an xarray dataset with coordinates data = xr.DataArray(values, [("dim_0", np.arange(number, dtype=float))]) # Group the coordinates to that all values fall in a single bin grouped = data.groupby_bins("dim_0", [-1.0, number + 1.0])
for number in [127, 128, 255, 256, 1000]: grouped_mean(number) ``` MVCE confirmation
Relevant log output
Anything else we need to know?This behaviour is only there when installing numbagg and flox next to xarray. (pip install xarray flox numbagg) The above mentioned output is from a github action, using linux and windows latest with python 3.11 Environment
Run python -c "import xarray as xr;print(xr.show_versions())"
/opt/hostedtoolcache/Python/3.[11](https://github.com/daanscheltens/test_xarray/actions/runs/7088608658/job/19291471251#step:10:12).6/x64/lib/python3.11/site-packages/_distutils_hack/__init__.py:33: UserWarning: Setuptools is replacing distutils.
warnings.warn("Setuptools is replacing distutils.")
INSTALLED VERSIONS
------------------
commit: None
python: 3.11.6 (main, Oct 3 2023, 04:42:57) [GCC 11.4.0]
python-bits: 64
OS: Linux
OS-release: 6.2.0-10[16](https://github.com/daanscheltens/test_xarray/actions/runs/7088608658/job/19291471251#step:10:17)-azure
machine: x86_64
processor: x86_64
byteorder: little
LC_ALL: None
LANG: C.UTF-8
LOCALE: ('en_US', 'UTF-8')
libhdf5: None
libnetcdf: None
xarray: [20](https://github.com/daanscheltens/test_xarray/actions/runs/7088608658/job/19291471251#step:10:21)[23](https://github.com/daanscheltens/test_xarray/actions/runs/7088608658/job/19291471251#step:10:24).11.0
pandas: 2.1.3
numpy: 1.[26](https://github.com/daanscheltens/test_xarray/actions/runs/7088608658/job/19291471251#step:10:27).2
scipy: 1.11.4
netCDF4: None
pydap: None
h5netcdf: None
h5py: None
Nio: None
zarr: None
cftime: None
nc_time_axis: None
iris: None
bottleneck: None
dask: None
distributed: None
matplotlib: None
cartopy: None
seaborn: None
numbagg: 0.6.4
fsspec: None
cupy: None
pint: None
sparse: None
flox: 0.8.5
numpy_groupies: 0.10.2
setuptools: 65.5.0
pip: 23.3.1
conda: None
pytest: None
mypy: None
IPython: None
sphinx: None
None
|
{ "url": "https://api.github.com/repos/pydata/xarray/issues/8515/reactions", "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
completed | 13221727 | issue |