home / github / issues

Menu
  • GraphQL API
  • Search all tables

issues: 2024104632

This data as json

id node_id number title user state locked assignee milestone comments created_at updated_at closed_at author_association active_lock_reason draft pull_request body reactions performed_via_github_app state_reason repo type
2024104632 I_kwDOAMm_X854pWK4 8515 Inconsistant behaviour of groupby_bins mean when using flox and numbagg 21100296 closed 0     5 2023-12-04T15:17:51Z 2023-12-05T08:21:44Z 2023-12-04T18:57:30Z NONE      

What happened?

When I group an xarray.DataArray in a single group and calculate the mean, then I expect the mean of this group to be the same as the mean of the input data.

When I have flox and numbagg installed next to xarray, I get inconsistant behavoir. The behaviour is consistent again when setting the option "use_flox" to False.

What did you expect to happen?

I expected xarray to give the mean of the values in the group. I expected this mean to be the same with flox as without flox. More specifically, I expected it to be (almost) equal to the numpy.mean.

Minimal Complete Verifiable Example

```Python

in a clean python.org environment:

pip install xarray, numbagg, flox

import numpy as np import xarray as xr

def grouped_mean(number): # Generate a set of random values np.random.seed(0) values = np.random.rand(number) # Use numpy to calculated the expected mean expected = np.mean(values) # Create an xarray dataset with coordinates data = xr.DataArray(values, [("dim_0", np.arange(number, dtype=float))]) # Group the coordinates to that all values fall in a single bin grouped = data.groupby_bins("dim_0", [-1.0, number + 1.0])

# Calculated the grouped mean without flox
xr.core.options.OPTIONS["use_flox"] = False
result_no_flox = grouped.mean().values[0]

# Calculate the grouped mean with flox
xr.core.options.OPTIONS["use_flox"] = True
result_flox = grouped.mean().values[0]

# Print the results
print(f"Try with number = {number}")
print(expected, "using numpy.mean")
print(result_no_flox, "grouped.mean no flox")
print(result_flox, "grouped.mean with flox")

for number in [127, 128, 255, 256, 1000]: grouped_mean(number) ```

MVCE confirmation

  • [X] Minimal example — the example is as focused as reasonably possible to demonstrate the underlying issue in xarray.
  • [X] Complete example — the example is self-contained, including all data and the text of any traceback.
  • [ ] Verifiable example — the example copy & pastes into an IPython prompt or Binder notebook, returning the result.
  • [X] New issue — a search of GitHub Issues suggests this is not a duplicate.
  • [x] Recent environment — the issue occurs with the latest version of xarray and its dependencies.

Relevant log output

Python Run python test.py Try with number = 127 0.5000245417623892 using numpy.mean 0.5000245417623892 grouped.mean no flox 0.5000245417623891 grouped.mean with flox Try with number = 128 0.49847415328514055 using numpy.mean 0.49847415328514055 grouped.mean no flox -0.49847415328514033 grouped.mean with flox Try with number = 255 0.4973500025365464 using numpy.mean 0.4973500025365464 grouped.mean no flox -126.82425064681932 grouped.mean with flox Try with number = 256 0.4957330979775834 using numpy.mean 0.4957330979775834 grouped.mean no flox nan grouped.mean with flox Try with number = 1000 0.49592153437178277 using numpy.mean 0.49592153437178277 grouped.mean no flox -20.663397265490953 grouped.mean with flox

Anything else we need to know?

This behaviour is only there when installing numbagg and flox next to xarray. (pip install xarray flox numbagg) The above mentioned output is from a github action, using linux and windows latest with python 3.11

Environment

Run python -c "import xarray as xr;print(xr.show_versions())" /opt/hostedtoolcache/Python/3.[11](https://github.com/daanscheltens/test_xarray/actions/runs/7088608658/job/19291471251#step:10:12).6/x64/lib/python3.11/site-packages/_distutils_hack/__init__.py:33: UserWarning: Setuptools is replacing distutils. warnings.warn("Setuptools is replacing distutils.") INSTALLED VERSIONS ------------------ commit: None python: 3.11.6 (main, Oct 3 2023, 04:42:57) [GCC 11.4.0] python-bits: 64 OS: Linux OS-release: 6.2.0-10[16](https://github.com/daanscheltens/test_xarray/actions/runs/7088608658/job/19291471251#step:10:17)-azure machine: x86_64 processor: x86_64 byteorder: little LC_ALL: None LANG: C.UTF-8 LOCALE: ('en_US', 'UTF-8') libhdf5: None libnetcdf: None xarray: [20](https://github.com/daanscheltens/test_xarray/actions/runs/7088608658/job/19291471251#step:10:21)[23](https://github.com/daanscheltens/test_xarray/actions/runs/7088608658/job/19291471251#step:10:24).11.0 pandas: 2.1.3 numpy: 1.[26](https://github.com/daanscheltens/test_xarray/actions/runs/7088608658/job/19291471251#step:10:27).2 scipy: 1.11.4 netCDF4: None pydap: None h5netcdf: None h5py: None Nio: None zarr: None cftime: None nc_time_axis: None iris: None bottleneck: None dask: None distributed: None matplotlib: None cartopy: None seaborn: None numbagg: 0.6.4 fsspec: None cupy: None pint: None sparse: None flox: 0.8.5 numpy_groupies: 0.10.2 setuptools: 65.5.0 pip: 23.3.1 conda: None pytest: None mypy: None IPython: None sphinx: None None
{
    "url": "https://api.github.com/repos/pydata/xarray/issues/8515/reactions",
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  completed 13221727 issue

Links from other tables

  • 2 rows from issues_id in issues_labels
  • 0 rows from issue in issue_comments
Powered by Datasette · Queries took 0.591ms · About: xarray-datasette