home / github / issues

Menu
  • GraphQL API
  • Search all tables

issues: 2193178037

This data as json

id node_id number title user state locked assignee milestone comments created_at updated_at closed_at author_association active_lock_reason draft pull_request body reactions performed_via_github_app state_reason repo type
2193178037 I_kwDOAMm_X86CuT21 8850 `groupy_bins` returns conflicting sizes error when DataArray to group is lazy 43613877 closed 0     2 2024-03-18T20:04:19Z 2024-03-18T22:14:19Z 2024-03-18T22:14:19Z CONTRIBUTOR      

What happened?

The xr.DataArray.groupy_bins seems to have an issue with lazy DataArrays and throws a mis-leading error-message about conflicting sizes for dimension, when the aggregator function, e.g. mean, is called.

What did you expect to happen?

I expected that the arrays are handled without any issue or an error-message that would mention that the arrays have to be loaded before groupby_bins can be applied.

Minimal Complete Verifiable Example

```Python import xarray as xr import numpy as np

x = np.arange(10) y = np.arange(5)

var1 = np.random.rand(len(y), len(x)) var2 = np.random.rand(len(y), len(x))

ds = xr.Dataset( { 'var1': (['y', 'x'], var1), 'var2': (['y', 'x'], 10+var2*10), }, coords={ 'x': x, 'y': y, } )

ds['var1'] = ds.var1.chunk(x=3)

ds.var1.groupby_bins(ds.var2, bins=np.linspace(10,20,100)).mean()

fails with

ValueError: conflicting sizes for dimension 'var2_bins': length 99 on 'var2_bins' and length 41 on {'var2_bins': <this-array>}

ds.var1.compute().groupby_bins(ds.var2, bins=np.linspace(10,20,100)).mean()

returns the expected output

""" <xarray.DataArray 'var1' (var2_bins: 99)> Size: 792B array([0.90665731, 0.39259895, 0.09858736, 0.94222699, 0.83785883, nan, 0.46287129, nan, nan, 0.02260558, 0.06989385, nan, nan, nan, 0.41192196, nan, nan, nan, 0.90680258, 0.74418783, 0.84559937, 0.43462018, nan, nan, 0.00244231, 0.65950057, nan, nan, 0.00515549, nan, 0.41554394, 0.74563456, nan, nan, nan, nan, nan, nan, nan, nan, nan, 0.48631902, nan, nan, 0.86050492, 0.05572065, nan, 0.7567633 , nan, 0.70537106, nan, nan, nan, nan, nan, nan, 0.65957427, 0.39201731, 0.3159046 , nan, 0.71012231, nan, nan, nan, nan, nan, nan, nan, 0.7104425 , nan, nan, 0.94564132, 0.81052373, nan, nan, 0.94000787, nan, 0.88280569, nan, nan, 0.33939775, 0.50393615, nan, 0.84943353, nan, nan, nan, nan, 0.28231671, 0.35149525, nan, nan, 0.18657728, nan, 0.23287227, 0.34968875, nan, nan, 0.3135791 ]) Coordinates: * var2_bins (var2_bins) object 792B (10.0, 10.101] ... (19.899, 20.0] """ ```

MVCE confirmation

  • [X] Minimal example — the example is as focused as reasonably possible to demonstrate the underlying issue in xarray.
  • [X] Complete example — the example is self-contained, including all data and the text of any traceback.
  • [X] Verifiable example — the example copy & pastes into an IPython prompt or Binder notebook, returning the result.
  • [X] New issue — a search of GitHub Issues suggests this is not a duplicate.
  • [X] Recent environment — the issue occurs with the latest version of xarray and its dependencies.

Relevant log output

```Python

ValueError Traceback (most recent call last) Input In [11], in <cell line: 1>() ----> 1 ds.var1.groupby_bins(ds.var2, bins=np.linspace(10,20,100)).mean()

File /work/mh0010/m300408/envs/covariability/lib/python3.9/site-packages/xarray/core/_aggregations.py:5984, in DataArrayGroupByAggregations.mean(self, dim, skipna, keep_attrs, kwargs) 5900 """ 5901 Reduce this DataArray's data by applying mean along some dimension(s). 5902 (...) 5977 * labels (labels) object 24B 'a' 'b' 'c' 5978 """ 5979 if ( 5980 flox_available 5981 and OPTIONS["use_flox"] 5982 and contains_only_chunked_or_numpy(self._obj) 5983 ): -> 5984 return self._flox_reduce( 5985 func="mean", 5986 dim=dim, 5987 skipna=skipna, 5988 # fill_value=fill_value, 5989 keep_attrs=keep_attrs, 5990 kwargs, 5991 ) 5992 else: 5993 return self._reduce_without_squeeze_warn( 5994 duck_array_ops.mean, 5995 dim=dim, (...) 5998 **kwargs, 5999 )

File /work/mh0010/m300408/envs/covariability/lib/python3.9/site-packages/xarray/core/groupby.py:1079, in GroupBy._flox_reduce(self, dim, keep_attrs, kwargs) 1076 kwargs.setdefault("min_count", 1) 1078 output_index = grouper.full_index -> 1079 result = xarray_reduce( 1080 obj.drop_vars(non_numeric.keys()), 1081 self._codes, 1082 dim=parsed_dim, 1083 # pass RangeIndex as a hint to flox that by is already factorized 1084 expected_groups=(pd.RangeIndex(len(output_index)),), 1085 isbin=False, 1086 keep_attrs=keep_attrs, 1087 kwargs, 1088 ) 1090 # we did end up reducing over dimension(s) that are 1091 # in the grouped variable 1092 group_dims = grouper.group.dims

File /work/mh0010/m300408/envs/covariability/lib/python3.9/site-packages/flox/xarray.py:384, in xarray_reduce(obj, func, expected_groups, isbin, sort, dim, split_out, fill_value, method, engine, keep_attrs, skipna, min_count, reindex, by, *finalize_kwargs) 382 actual = actual.set_coords(levelnames) 383 else: --> 384 actual[name] = expect 385 if keep_attrs: 386 actual[name].attrs = by_.attrs

File /work/mh0010/m300408/envs/covariability/lib/python3.9/site-packages/xarray/core/dataset.py:1603, in Dataset.setitem(self, key, value) 1598 if isinstance(value, Dataset): 1599 raise TypeError( 1600 "Cannot assign a Dataset to a single key - only a DataArray or Variable " 1601 "object can be stored under a single key." 1602 ) -> 1603 self.update({key: value}) 1605 elif utils.iterable_of_hashable(key): 1606 keylist = list(key)

File /work/mh0010/m300408/envs/covariability/lib/python3.9/site-packages/xarray/core/dataset.py:5617, in Dataset.update(self, other) 5581 def update(self, other: CoercibleMapping) -> Self: 5582 """Update this dataset's variables with those from another dataset. 5583 5584 Just like :py:meth:dict.update this is a in-place operation. (...) 5615 Dataset.merge 5616 """ -> 5617 merge_result = dataset_update_method(self, other) 5618 return self._replace(inplace=True, **merge_result._asdict())

File /work/mh0010/m300408/envs/covariability/lib/python3.9/site-packages/xarray/core/merge.py:1075, in dataset_update_method(dataset, other) 1072 if coord_names: 1073 other[key] = value.drop_vars(coord_names) -> 1075 return merge_core( 1076 [dataset, other], 1077 priority_arg=1, 1078 indexes=dataset.xindexes, 1079 combine_attrs="override", 1080 )

File /work/mh0010/m300408/envs/covariability/lib/python3.9/site-packages/xarray/core/merge.py:724, in merge_core(objects, compat, join, combine_attrs, priority_arg, explicit_coords, indexes, fill_value, skip_align_args) 719 prioritized = _get_priority_vars_and_indexes(aligned, priority_arg, compat=compat) 720 variables, out_indexes = merge_collected( 721 collected, prioritized, compat=compat, combine_attrs=combine_attrs 722 ) --> 724 dims = calculate_dimensions(variables) 726 coord_names, noncoord_names = determine_coords(coerced) 727 if compat == "minimal": 728 # coordinates may be dropped in merged results

File /work/mh0010/m300408/envs/covariability/lib/python3.9/site-packages/xarray/core/variable.py:2947, in calculate_dimensions(variables) 2945 last_used[dim] = k 2946 elif dims[dim] != size: -> 2947 raise ValueError( 2948 f"conflicting sizes for dimension {dim!r}: " 2949 f"length {size} on {k!r} and length {dims[dim]} on {last_used!r}" 2950 ) 2951 return dims

ValueError: conflicting sizes for dimension 'var2_bins': length 99 on 'var2_bins' and length 41 on {'var2_bins': <this-array>} ```

Anything else we need to know?

No response

Environment

INSTALLED VERSIONS ------------------ commit: None python: 3.9.10 | packaged by conda-forge | (main, Feb 1 2022, 21:24:11) [GCC 9.4.0] python-bits: 64 OS: Linux OS-release: 4.18.0-425.10.1.el8_7.x86_64 machine: x86_64 processor: x86_64 byteorder: little LC_ALL: None LANG: en_US.UTF-8 LOCALE: ('en_US', 'UTF-8') libhdf5: 1.12.1 libnetcdf: 4.8.1 xarray: 2024.2.0 pandas: 1.4.3 numpy: 1.24.4 scipy: 1.9.0 netCDF4: 1.6.0 pydap: installed h5netcdf: 1.0.2 h5py: 3.7.0 Nio: None zarr: 2.12.0 cftime: 1.6.0 nc_time_axis: 1.4.1 iris: None bottleneck: 1.3.5 dask: 2022.9.2 distributed: 2022.9.2 matplotlib: 3.5.1 cartopy: 0.20.1 seaborn: 0.11.2 numbagg: None fsspec: 2023.9.2 cupy: None pint: 0.17 sparse: None flox: 0.5.9 numpy_groupies: 0.9.19 setuptools: 68.0.0 pip: 22.0.4 conda: 4.11.0 pytest: 7.1.3 mypy: 0.971 IPython: 8.1.1 sphinx: None
{
    "url": "https://api.github.com/repos/pydata/xarray/issues/8850/reactions",
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  completed 13221727 issue

Links from other tables

  • 0 rows from issues_id in issues_labels
  • 0 rows from issue in issue_comments
Powered by Datasette · Queries took 2.44ms · About: xarray-datasette