home / github / issues

Menu
  • Search all tables
  • GraphQL API

issues: 1670415238

This data as json

id node_id number title user state locked assignee milestone comments created_at updated_at closed_at author_association active_lock_reason draft pull_request body reactions performed_via_github_app state_reason repo type
1670415238 I_kwDOAMm_X85jkIOG 7759 groupby_bins returns data in reversed order 9074527 closed 0     2 2023-04-17T05:03:35Z 2023-04-18T14:48:25Z 2023-04-18T14:48:25Z NONE      

What happened?

I have previously used DataArray.groupby_bins to great effect with some complex binning tasks. I recently upgraded my base Python to 3.10 and found disastrous results from code that previously worked fine. I now find that, at least with non-linear bins, groupby_bins now produces reversed results in the resulting bin counts. The ordering of the data and coordinates are now misaligned - so I think this is a bug.

For reference, I can reproduce the error in 2023.4.0 but 2023.3.0 gives the correct result.

What did you expect to happen?

I expect groupby_bins to produce the same counts as similar methods in numpy or pandas

Minimal Complete Verifiable Example

```Python import numpy as np import xarray as xr import pandas as pd import matplotlib.pyplot as plt import sys

print(f"numpy version: {np.version}") print(f"xarray version: {xr.version}") print(f"pandas version: {pd.version}") print(f"python version: {sys.version}")

Generate random data

Make the coordiantes follow a normal distribution

np.random.seed(42) coords = np.random.normal(5, 5, 1000) bins = np.logspace(-4, 1, 10)

xArray

Make a mock dataarray

darr = xr.DataArray(coords, coords=[coords], dims=["coords"]) counts_xr = darr.groupby_bins("coords", bins).count() c_bin_xr = np.array([i.mid for i in counts_xr.coords_bins.values])

Numpy

counts_np, edges = np.histogram(coords, bins=bins) c_bin_np = (edges[1:] + edges[:-1]) / 2

Pandas

df = pd.DataFrame(coords, columns=["coords"]) counts_pd = df.groupby(pd.cut(df.coords, bins)).count() c_bin_pd = np.array([i.mid for i in counts_pd.index.values])

print(f"{counts_xr.data=}") print(f"{counts_np=}") print(f"{counts_pd.values=}")

_ = plt.figure() _ = plt.plot(c_bin_np, counts_np, 'o', label='numpy') _ = plt.plot(c_bin_xr, counts_xr, 'x', label='xarray') _ = plt.plot(c_bin_pd, counts_pd, 's', label='pandas', markerfacecolor='none') _ = plt.xscale('log') _ = plt.yscale('log') _ = plt.legend() ```

MVCE confirmation

  • [X] Minimal example — the example is as focused as reasonably possible to demonstrate the underlying issue in xarray.
  • [X] Complete example — the example is self-contained, including all data and the text of any traceback.
  • [X] Verifiable example — the example copy & pastes into an IPython prompt or Binder notebook, returning the result.
  • [X] New issue — a search of GitHub Issues suggests this is not a duplicate.

Relevant log output

```Python

Run 1

numpy version: 1.23.5 xarray version: 2023.4.0 pandas version: 2.0.0 python version: 3.10.10 | packaged by conda-forge | (main, Mar 24 2023, 20:17:34) [Clang 14.0.6 ] counts_xr.data=array([ nan, nan, nan, 506., 27., 153., 9., 2., 1.]) counts_np=array([ 0, 0, 0, 1, 2, 9, 27, 153, 506]) counts_pd.values=array([[ 0], [ 0], [ 0], [ 1], [ 2], [ 9], [ 27], [153], [506]])

Run 2

numpy version: 1.24.2 xarray version: 2023.3.0 pandas version: 1.5.3 python version: 3.9.16 | packaged by conda-forge | (main, Feb 1 2023, 21:42:20) [Clang 14.0.6 ] counts_xr.data=array([ nan, nan, nan, 1., 2., 9., 27., 153., 506.]) counts_np=array([ 0, 0, 0, 1, 2, 9, 27, 153, 506]) counts_pd.values=array([[ 0], [ 0], [ 0], [ 1], [ 2], [ 9], [ 27], [153], [506]]) ```

Anything else we need to know?

No response

Environment

INSTALLED VERSIONS ------------------ commit: None python: 3.10.10 | packaged by conda-forge | (main, Mar 24 2023, 20:17:34) [Clang 14.0.6 ] python-bits: 64 OS: Darwin OS-release: 20.6.0 machine: x86_64 processor: i386 byteorder: little LC_ALL: None LANG: None LOCALE: (None, 'UTF-8') libhdf5: 1.14.0 libnetcdf: None xarray: 2023.4.0 pandas: 2.0.0 numpy: 1.23.5 scipy: 1.10.1 netCDF4: None pydap: None h5netcdf: None h5py: 3.8.0 Nio: None zarr: None cftime: None nc_time_axis: None PseudoNetCDF: None iris: None bottleneck: None dask: 2023.3.2 distributed: 2023.3.2.1 matplotlib: 3.7.1 cartopy: None seaborn: None numbagg: None fsspec: 2023.4.0 cupy: None pint: None sparse: None flox: None numpy_groupies: None setuptools: 65.6.3 pip: 23.0.1 conda: 23.1.0 pytest: 7.3.1 mypy: None IPython: 8.12.0 sphinx: None [/Users/tho822/mambaforge/lib/python3.10/site-packages/_distutils_hack/__init__.py:33](https://file+.vscode-resource.vscode-cdn.net/Users/tho822/mambaforge/lib/python3.10/site-packages/_distutils_hack/__init__.py:33): UserWarning: Setuptools is replacing distutils. warnings.warn("Setuptools is replacing distutils.") INSTALLED VERSIONS ------------------ commit: None python: 3.9.16 | packaged by conda-forge | (main, Feb 1 2023, 21:42:20) [Clang 14.0.6 ] python-bits: 64 OS: Darwin OS-release: 20.6.0 machine: x86_64 processor: i386 byteorder: little LC_ALL: None LANG: None LOCALE: (None, 'UTF-8') libhdf5: None libnetcdf: None xarray: 2023.3.0 pandas: 1.5.3 numpy: 1.24.2 scipy: None netCDF4: None pydap: None h5netcdf: None h5py: None Nio: None zarr: None cftime: None nc_time_axis: None PseudoNetCDF: None rasterio: None cfgrib: None iris: None bottleneck: None dask: None distributed: None matplotlib: 3.7.1 cartopy: None seaborn: None numbagg: None fsspec: None cupy: None pint: None sparse: None flox: None numpy_groupies: None setuptools: 67.6.1 pip: 23.1 conda: None pytest: None mypy: None IPython: 8.12.0 sphinx: None [/Users/tho822/mambaforge/envs/py39/lib/python3.9/site-packages/_distutils_hack/__init__.py:33](https://file+.vscode-resource.vscode-cdn.net/Users/tho822/mambaforge/envs/py39/lib/python3.9/site-packages/_distutils_hack/__init__.py:33): UserWarning: Setuptools is replacing distutils. warnings.warn("Setuptools is replacing distutils.")
{
    "url": "https://api.github.com/repos/pydata/xarray/issues/7759/reactions",
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  completed 13221727 issue

Links from other tables

  • 2 rows from issues_id in issues_labels
  • 2 rows from issue in issue_comments
Powered by Datasette · Queries took 0.836ms · About: xarray-datasette