home / github / issues

Menu
  • GraphQL API
  • Search all tables

issues: 1858062203

This data as json

id node_id number title user state locked assignee milestone comments created_at updated_at closed_at author_association active_lock_reason draft pull_request body reactions performed_via_github_app state_reason repo type
1858062203 I_kwDOAMm_X85uv8d7 8090 DataArrayResampleAggregations break with _flox_reduce where source DataArray has a discontinuous time dimension 56110893 open 0     4 2023-08-20T09:48:42Z 2023-08-24T04:20:32Z   NONE      

What happened?

When resampling a DataArray with a discontinuity in the time dimension the resample object contains placeholder groups for the missing times in between the present times.

This seems to cause flox reductions to break (any, count and all) as it complains about a fill_value of None. See example provided below.

What did you expect to happen?

The result should be computed successfully in the same way that it is without using flox.

Minimal Complete Verifiable Example

```Python import xarray as xr import numpy as np

dates = (("1980-12-01", "1990-11-30"), ("2000-12-01", "2010-11-30")) times = [xr.cftime_range(*d, freq="D", calendar="360_day") for d in dates]

da = xr.concat( [xr.DataArray(np.random.rand(len(t)), coords={"time": t}, dims="time") for t in times], dim="time" )

da = da.chunk(time=360)

with xr.set_options(use_flox=True): # FAILS - discontinuous time dimension before resample (da > 0.5).resample(time="AS-DEC").any(dim="time")

with xr.set_options(use_flox=True): # SUCCEEDS - continuous time dimension before resample

(da.sel(time=slice(*dates[0])) > 0.5).resample(time="AS-DEC").any(dim="time")

with xr.set_options(use_flox=True): # SUCCEEDS - compute chunks before resample

(da > 0.5).compute().resample(time="AS-DEC").any(dim="time")

with xr.set_options(use_flox=False): # SUCCEEDS - don't use flox

(da > 0.5).resample(time="AS-DEC").any(dim="time")

```

MVCE confirmation

  • [X] Minimal example — the example is as focused as reasonably possible to demonstrate the underlying issue in xarray.
  • [X] Complete example — the example is self-contained, including all data and the text of any traceback.
  • [X] Verifiable example — the example copy & pastes into an IPython prompt or Binder notebook, returning the result.
  • [x] New issue — a search of GitHub Issues suggests this is not a duplicate.

Relevant log output

```Python

ValueError Traceback (most recent call last) Cell In[60], line 1 ----> 1 (da > 0.5).resample(time="AS-DEC").any(dim="time")

File ~/miniconda3/envs/forge310/lib/python3.10/site-packages/xarray/core/_aggregations.py:7029, in DataArrayResampleAggregations.any(self, dim, keep_attrs, kwargs) 6960 """ 6961 Reduce this DataArray's data by applying any along some dimension(s). 6962 (...) 7022 * time (time) datetime64[ns] 2001-01-31 2001-04-30 2001-07-31 7023 """ 7024 if ( 7025 flox_available 7026 and OPTIONS["use_flox"] 7027 and contains_only_chunked_or_numpy(self._obj) 7028 ): -> 7029 return self._flox_reduce( 7030 func="any", 7031 dim=dim, 7032 # fill_value=fill_value, 7033 keep_attrs=keep_attrs, 7034 kwargs, 7035 ) 7036 else: 7037 return self.reduce( 7038 duck_array_ops.array_any, 7039 dim=dim, 7040 keep_attrs=keep_attrs, 7041 **kwargs, 7042 )

File ~/miniconda3/envs/forge310/lib/python3.10/site-packages/xarray/core/resample.py:57, in Resample._flox_reduce(self, dim, keep_attrs, kwargs) 51 def _flox_reduce( 52 self, 53 dim: Dims, 54 keep_attrs: bool | None = None, 55 kwargs, 56 ) -> T_Xarray: ---> 57 result = super()._flox_reduce(dim=dim, keep_attrs=keep_attrs, **kwargs) 58 result = result.rename({RESAMPLE_DIM: self._group_dim}) 59 return result

File ~/miniconda3/envs/forge310/lib/python3.10/site-packages/xarray/core/groupby.py:1018, in GroupBy._flox_reduce(self, dim, keep_attrs, kwargs) 1015 kwargs.setdefault("min_count", 1) 1017 output_index = grouper.full_index -> 1018 result = xarray_reduce( 1019 obj.drop_vars(non_numeric.keys()), 1020 self._codes, 1021 dim=parsed_dim, 1022 # pass RangeIndex as a hint to flox that by is already factorized 1023 expected_groups=(pd.RangeIndex(len(output_index)),), 1024 isbin=False, 1025 keep_attrs=keep_attrs, 1026 kwargs, 1027 ) 1029 # we did end up reducing over dimension(s) that are 1030 # in the grouped variable 1031 group_dims = grouper.group.dims

File ~/miniconda3/envs/forge310/lib/python3.10/site-packages/flox/xarray.py:408, in xarray_reduce(obj, func, expected_groups, isbin, sort, dim, fill_value, dtype, method, engine, keep_attrs, skipna, min_count, reindex, by, finalize_kwargs) 406 output_core_dims = [d for d in input_core_dims[0] if d not in dim_tuple] 407 output_core_dims.extend(group_names) --> 408 actual = xr.apply_ufunc( 409 wrapper, 410 ds_broad.drop_vars(tuple(missing_dim)).transpose(..., grouper_dims), 411 *by_da, 412 input_core_dims=input_core_dims, 413 # for xarray's test_groupby_duplicate_coordinate_labels 414 exclude_dims=set(dim_tuple), 415 output_core_dims=[output_core_dims], 416 dask="allowed", 417 dask_gufunc_kwargs=dict( 418 output_sizes=group_sizes, output_dtypes=[dtype] if dtype is not None else None 419 ), 420 keep_attrs=keep_attrs, 421 kwargs={ 422 "func": func, 423 "axis": axis, 424 "sort": sort, 425 "fill_value": fill_value, 426 "method": method, 427 "min_count": min_count, 428 "skipna": skipna, 429 "engine": engine, 430 "reindex": reindex, 431 "expected_groups": tuple(expected_groups), 432 "isbin": isbins, 433 "finalize_kwargs": finalize_kwargs, 434 "dtype": dtype, 435 "core_dims": input_core_dims, 436 }, 437 ) 439 # restore non-dim coord variables without the core dimension 440 # TODO: shouldn't apply_ufunc handle this? 441 for var in set(ds_broad._coord_names) - set(ds_broad._indexes) - set(ds_broad.dims):

File ~/miniconda3/envs/forge310/lib/python3.10/site-packages/xarray/core/computation.py:1185, in apply_ufunc(func, input_core_dims, output_core_dims, exclude_dims, vectorize, join, dataset_join, dataset_fill_value, keep_attrs, kwargs, dask, output_dtypes, output_sizes, meta, dask_gufunc_kwargs, args) 1183 # feed datasets apply_variable_ufunc through apply_dataset_vfunc 1184 elif any(is_dict_like(a) for a in args): -> 1185 return apply_dataset_vfunc( 1186 variables_vfunc, 1187 args, 1188 signature=signature, 1189 join=join, 1190 exclude_dims=exclude_dims, 1191 dataset_join=dataset_join, 1192 fill_value=dataset_fill_value, 1193 keep_attrs=keep_attrs, 1194 ) 1195 # feed DataArray apply_variable_ufunc through apply_dataarray_vfunc 1196 elif any(isinstance(a, DataArray) for a in args):

File ~/miniconda3/envs/forge310/lib/python3.10/site-packages/xarray/core/computation.py:469, in apply_dataset_vfunc(func, signature, join, dataset_join, fill_value, exclude_dims, keep_attrs, args) 464 list_of_coords, list_of_indexes = build_output_coords_and_indexes( 465 args, signature, exclude_dims, combine_attrs=keep_attrs 466 ) 467 args = tuple(getattr(arg, "data_vars", arg) for arg in args) --> 469 result_vars = apply_dict_of_variables_vfunc( 470 func, args, signature=signature, join=dataset_join, fill_value=fill_value 471 ) 473 out: Dataset | tuple[Dataset, ...] 474 if signature.num_outputs > 1:

File ~/miniconda3/envs/forge310/lib/python3.10/site-packages/xarray/core/computation.py:411, in apply_dict_of_variables_vfunc(func, signature, join, fill_value, args) 409 result_vars = {} 410 for name, variable_args in zip(names, grouped_by_name): --> 411 result_vars[name] = func(variable_args) 413 if signature.num_outputs > 1: 414 return _unpack_dict_tuples(result_vars, signature.num_outputs)

File ~/miniconda3/envs/forge310/lib/python3.10/site-packages/xarray/core/computation.py:761, in apply_variable_ufunc(func, signature, exclude_dims, dask, output_dtypes, vectorize, keep_attrs, dask_gufunc_kwargs, args) 756 if vectorize: 757 func = _vectorize( 758 func, signature, output_dtypes=output_dtypes, exclude_dims=exclude_dims 759 ) --> 761 result_data = func(input_data) 763 if signature.num_outputs == 1: 764 result_data = (result_data,)

File ~/miniconda3/envs/forge310/lib/python3.10/site-packages/flox/xarray.py:379, in xarray_reduce.<locals>.wrapper(array, func, skipna, core_dims, by, kwargs) 376 offset = min(array) 377 array = datetime_to_numeric(array, offset, datetime_unit="us") --> 379 result, groups = groupby_reduce(array, by, func=func, *kwargs) 381 # Output of count has an int dtype. 382 if requires_numeric and func != "count":

File ~/miniconda3/envs/forge310/lib/python3.10/site-packages/flox/core.py:2011, in groupby_reduce(array, func, expected_groups, sort, isbin, axis, fill_value, dtype, min_count, method, engine, reindex, finalize_kwargs, *by) 2005 groups = (groups[0][sorted_idx],) 2007 if factorize_early: 2008 # nan group labels are factorized to -1, and preserved 2009 # now we get rid of them by reindexing 2010 # This also handles bins with no data -> 2011 result = reindex_( 2012 result, from_=groups[0], to=expected_groups, fill_value=fill_value 2013 ).reshape(result.shape[:-1] + grp_shape) 2014 groups = final_groups 2016 if is_bool_array and (_is_minmax_reduction(func) or _is_first_last_reduction(func)):

File ~/miniconda3/envs/forge310/lib/python3.10/site-packages/flox/core.py:428, in reindex_(array, from_, to, fill_value, axis, promote) 426 if any(idx == -1): 427 if fill_value is None: --> 428 raise ValueError("Filling is required. fill_value cannot be None.") 429 indexer[axis] = idx == -1 430 # This allows us to match xarray's type promotion rules

ValueError: Filling is required. fill_value cannot be None. ```

Anything else we need to know?

No response

Environment

INSTALLED VERSIONS ------------------ commit: None python: 3.10.12 | packaged by conda-forge | (main, Jun 23 2023, 22:41:52) [Clang 15.0.7 ] python-bits: 64 OS: Darwin OS-release: 22.5.0 machine: arm64 processor: arm byteorder: little LC_ALL: None LANG: en_GB.UTF-8 LOCALE: ('en_GB', 'UTF-8') libhdf5: 1.14.1 libnetcdf: 4.9.2 xarray: 2023.7.0 pandas: 1.5.3 numpy: 1.24.4 scipy: 1.11.1 netCDF4: 1.6.4 pydap: installed h5netcdf: 1.2.0 h5py: 3.9.0 Nio: None zarr: 2.16.0 cftime: 1.6.2 nc_time_axis: 1.4.1 PseudoNetCDF: None iris: 3.6.1 bottleneck: 1.3.7 dask: 2023.8.1 distributed: 2023.8.1 matplotlib: 3.7.2 cartopy: 0.22.0 seaborn: 0.12.2 numbagg: 0.2.2 fsspec: 2023.6.0 cupy: None pint: 0.22 sparse: 0.14.0 flox: 0.7.2 numpy_groupies: 0.9.22 setuptools: 68.1.2 pip: 23.2.1 conda: None pytest: None mypy: None IPython: 8.14.0 sphinx: None
{
    "url": "https://api.github.com/repos/pydata/xarray/issues/8090/reactions",
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
    13221727 issue

Links from other tables

  • 2 rows from issues_id in issues_labels
  • 0 rows from issue in issue_comments
Powered by Datasette · Queries took 936.994ms · About: xarray-datasette