id,node_id,number,title,user,state,locked,assignee,milestone,comments,created_at,updated_at,closed_at,author_association,active_lock_reason,draft,pull_request,body,reactions,performed_via_github_app,state_reason,repo,type 1090229430,I_kwDOAMm_X85A-5S2,6124,"bool(ds) should raise a ""the truth value of a Dataset is ambiguous"" error",3698640,open,0,,,15,2021-12-29T02:35:39Z,2023-03-12T15:47:01Z,,CONTRIBUTOR,,,,"Throwing this out there - happy to be shot down if people are opposed. ### Current behavior / griping Currently, coercing a dataset to a boolean invokes [`ds.__bool__`](https://github.com/pydata/xarray/blob/main/xarray/core/dataset.py#L1451-L1452), which in turn calls `bool(ds.data_vars)`: ```python class Dataset(DataWithCoords, DatasetArithmetic, Mapping): ... def __bool__(self) -> bool: return bool(self.data_vars) ``` This has the unfortunate property of returning True as long as there is at least one data_variable, regardless of the contents. Currently, the behavior of `Dataset.__bool__` is, at least as far as I've seen, never helpful but frequently unhelpful. I've seen (and written) tests written for DataArrays being passed a Dataset and suddenly the tests are meaningless _so many times_. Conversely, I've never found a legitimate use case for `bool(ds)`. As far as I can tell, this is essentially the same as `len(ds.data_vars) > 0`. In fact, while testing out my proposed changes below on a fork, I found two tests in the xarray test suite that had succumbed to this issue: see https://github.com/pydata/xarray/pull/6122 and https://github.com/pydata/xarray/pull/6123. This *has* been discussed before - see https://github.com/pydata/xarray/issues/4290. This discussion focused on the question ""should ` bool(xr.Dataset({'a': False}))` return False?"". I agree that it's not clear when it should be false and picking a behavior which deviates from Mapping feels arbitrary and gross. ### Proposed behavior I'm proposing that the API be changed, so that ` bool(xr.Dataset({'a': False}))` raise an error, similar to the implementation in [`pd.Series`](https://github.com/pandas-dev/pandas/blob/v1.3.5/pandas/core/generic.py#L1535-L1542). In this implementation in pandas, attempting to evaluate even a single-element series as a boolean raises an error: ```python In [14]: bool(pd.Series([False])) --------------------------------------------------------------------------- ValueError Traceback (most recent call last) in ----> 1 bool(pd.Series([False])) ~/miniconda3/envs/rhodium-env/lib/python3.9/site-packages/pandas/core/generic.py in __nonzero__(self) 1532 @final 1533 def __nonzero__(self): -> 1534 raise ValueError( 1535 f""The truth value of a {type(self).__name__} is ambiguous. "" 1536 ""Use a.empty, a.bool(), a.item(), a.any() or a.all()."" ValueError: The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all(). ``` I understand hesitancy around changing the core API. That said, if anyone can find an important, correct use of `bool(ds)` in the wild I'll eat my hat :) ### Implementation This could be as simple as raising an error on `ds.__bool__`, something like: ```python class Dataset(DataWithCoords, DatasetArithmetic, Mapping): ... def __bool__(self) -> bool: raise ValueError( ""The truth value of a Dataset is ambiguous. Reduce the data "" ""variables to a scalar value with any(ds.values()) or "" ""all(ds.values())."" ) ``` The only other change that would be needed is an assertion that directly calls `bool(ds)` in [test_dataset::TestDataset.test_properties](https://github.com/pydata/xarray/blob/main/xarray/tests/test_dataset.py#L547), which checks for the exact behavior I'm changing: ```python assert bool(ds) ``` This would need to be changed to: ```python with pytest.raises(ValueError): bool(ds) ``` If this sounds good, I can submit a PR with these changes.","{""url"": ""https://api.github.com/repos/pydata/xarray/issues/6124/reactions"", ""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,,13221727,issue