issues: 2193178037
This data as json
| id | node_id | number | title | user | state | locked | assignee | milestone | comments | created_at | updated_at | closed_at | author_association | active_lock_reason | draft | pull_request | body | reactions | performed_via_github_app | state_reason | repo | type | 
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 2193178037 | I_kwDOAMm_X86CuT21 | 8850 | `groupy_bins` returns conflicting sizes error when DataArray to group is lazy | 43613877 | closed | 0 | 2 | 2024-03-18T20:04:19Z | 2024-03-18T22:14:19Z | 2024-03-18T22:14:19Z | CONTRIBUTOR | What happened?The  What did you expect to happen?I expected that the arrays are handled without any issue or an error-message that would mention that the arrays have to be loaded before  Minimal Complete Verifiable Example```Python import xarray as xr import numpy as np x = np.arange(10) y = np.arange(5) var1 = np.random.rand(len(y), len(x)) var2 = np.random.rand(len(y), len(x)) ds = xr.Dataset( { 'var1': (['y', 'x'], var1), 'var2': (['y', 'x'], 10+var2*10), }, coords={ 'x': x, 'y': y, } ) ds['var1'] = ds.var1.chunk(x=3) ds.var1.groupby_bins(ds.var2, bins=np.linspace(10,20,100)).mean() fails withValueError: conflicting sizes for dimension 'var2_bins': length 99 on 'var2_bins' and length 41 on {'var2_bins': <this-array>}ds.var1.compute().groupby_bins(ds.var2, bins=np.linspace(10,20,100)).mean() returns the expected output""" <xarray.DataArray 'var1' (var2_bins: 99)> Size: 792B array([0.90665731, 0.39259895, 0.09858736, 0.94222699, 0.83785883, nan, 0.46287129, nan, nan, 0.02260558, 0.06989385, nan, nan, nan, 0.41192196, nan, nan, nan, 0.90680258, 0.74418783, 0.84559937, 0.43462018, nan, nan, 0.00244231, 0.65950057, nan, nan, 0.00515549, nan, 0.41554394, 0.74563456, nan, nan, nan, nan, nan, nan, nan, nan, nan, 0.48631902, nan, nan, 0.86050492, 0.05572065, nan, 0.7567633 , nan, 0.70537106, nan, nan, nan, nan, nan, nan, 0.65957427, 0.39201731, 0.3159046 , nan, 0.71012231, nan, nan, nan, nan, nan, nan, nan, 0.7104425 , nan, nan, 0.94564132, 0.81052373, nan, nan, 0.94000787, nan, 0.88280569, nan, nan, 0.33939775, 0.50393615, nan, 0.84943353, nan, nan, nan, nan, 0.28231671, 0.35149525, nan, nan, 0.18657728, nan, 0.23287227, 0.34968875, nan, nan, 0.3135791 ]) Coordinates: * var2_bins (var2_bins) object 792B (10.0, 10.101] ... (19.899, 20.0] """ ``` MVCE confirmation
 Relevant log output```PythonValueError Traceback (most recent call last) Input In [11], in <cell line: 1>() ----> 1 ds.var1.groupby_bins(ds.var2, bins=np.linspace(10,20,100)).mean() File /work/mh0010/m300408/envs/covariability/lib/python3.9/site-packages/xarray/core/_aggregations.py:5984, in DataArrayGroupByAggregations.mean(self, dim, skipna, keep_attrs, kwargs)
   5900 """
   5901 Reduce this DataArray's data by applying  File /work/mh0010/m300408/envs/covariability/lib/python3.9/site-packages/xarray/core/groupby.py:1079, in GroupBy._flox_reduce(self, dim, keep_attrs, kwargs)
   1076     kwargs.setdefault("min_count", 1)
   1078 output_index = grouper.full_index
-> 1079 result = xarray_reduce(
   1080     obj.drop_vars(non_numeric.keys()),
   1081     self._codes,
   1082     dim=parsed_dim,
   1083     # pass RangeIndex as a hint to flox that  File /work/mh0010/m300408/envs/covariability/lib/python3.9/site-packages/flox/xarray.py:384, in xarray_reduce(obj, func, expected_groups, isbin, sort, dim, split_out, fill_value, method, engine, keep_attrs, skipna, min_count, reindex, by, *finalize_kwargs) 382 actual = actual.set_coords(levelnames) 383 else: --> 384 actual[name] = expect 385 if keep_attrs: 386 actual[name].attrs = by_.attrs File /work/mh0010/m300408/envs/covariability/lib/python3.9/site-packages/xarray/core/dataset.py:1603, in Dataset.setitem(self, key, value) 1598 if isinstance(value, Dataset): 1599 raise TypeError( 1600 "Cannot assign a Dataset to a single key - only a DataArray or Variable " 1601 "object can be stored under a single key." 1602 ) -> 1603 self.update({key: value}) 1605 elif utils.iterable_of_hashable(key): 1606 keylist = list(key) File /work/mh0010/m300408/envs/covariability/lib/python3.9/site-packages/xarray/core/dataset.py:5617, in Dataset.update(self, other)
   5581 def update(self, other: CoercibleMapping) -> Self:
   5582     """Update this dataset's variables with those from another dataset.
   5583 
   5584     Just like :py:meth: File /work/mh0010/m300408/envs/covariability/lib/python3.9/site-packages/xarray/core/merge.py:1075, in dataset_update_method(dataset, other) 1072 if coord_names: 1073 other[key] = value.drop_vars(coord_names) -> 1075 return merge_core( 1076 [dataset, other], 1077 priority_arg=1, 1078 indexes=dataset.xindexes, 1079 combine_attrs="override", 1080 ) File /work/mh0010/m300408/envs/covariability/lib/python3.9/site-packages/xarray/core/merge.py:724, in merge_core(objects, compat, join, combine_attrs, priority_arg, explicit_coords, indexes, fill_value, skip_align_args) 719 prioritized = _get_priority_vars_and_indexes(aligned, priority_arg, compat=compat) 720 variables, out_indexes = merge_collected( 721 collected, prioritized, compat=compat, combine_attrs=combine_attrs 722 ) --> 724 dims = calculate_dimensions(variables) 726 coord_names, noncoord_names = determine_coords(coerced) 727 if compat == "minimal": 728 # coordinates may be dropped in merged results File /work/mh0010/m300408/envs/covariability/lib/python3.9/site-packages/xarray/core/variable.py:2947, in calculate_dimensions(variables) 2945 last_used[dim] = k 2946 elif dims[dim] != size: -> 2947 raise ValueError( 2948 f"conflicting sizes for dimension {dim!r}: " 2949 f"length {size} on {k!r} and length {dims[dim]} on {last_used!r}" 2950 ) 2951 return dims ValueError: conflicting sizes for dimension 'var2_bins': length 99 on 'var2_bins' and length 41 on {'var2_bins': <this-array>} ``` Anything else we need to know?No response Environment
INSTALLED VERSIONS
------------------
commit: None
python: 3.9.10 | packaged by conda-forge | (main, Feb  1 2022, 21:24:11) 
[GCC 9.4.0]
python-bits: 64
OS: Linux
OS-release: 4.18.0-425.10.1.el8_7.x86_64
machine: x86_64
processor: x86_64
byteorder: little
LC_ALL: None
LANG: en_US.UTF-8
LOCALE: ('en_US', 'UTF-8')
libhdf5: 1.12.1
libnetcdf: 4.8.1
xarray: 2024.2.0
pandas: 1.4.3
numpy: 1.24.4
scipy: 1.9.0
netCDF4: 1.6.0
pydap: installed
h5netcdf: 1.0.2
h5py: 3.7.0
Nio: None
zarr: 2.12.0
cftime: 1.6.0
nc_time_axis: 1.4.1
iris: None
bottleneck: 1.3.5
dask: 2022.9.2
distributed: 2022.9.2
matplotlib: 3.5.1
cartopy: 0.20.1
seaborn: 0.11.2
numbagg: None
fsspec: 2023.9.2
cupy: None
pint: 0.17
sparse: None
flox: 0.5.9
numpy_groupies: 0.9.19
setuptools: 68.0.0
pip: 22.0.4
conda: 4.11.0
pytest: 7.1.3
mypy: 0.971
IPython: 8.1.1
sphinx: None
 | {
    "url": "https://api.github.com/repos/pydata/xarray/issues/8850/reactions",
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
} | completed | 13221727 | issue |