home / github / issues

Menu
  • GraphQL API
  • Search all tables

issues: 1237587122

This data as json

id node_id number title user state locked assignee milestone comments created_at updated_at closed_at author_association active_lock_reason draft pull_request body reactions performed_via_github_app state_reason repo type
1237587122 I_kwDOAMm_X85JxBSy 6615 Flox grouping does not cast bool to int in summation 20629530 closed 0     0 2022-05-16T19:06:45Z 2022-05-17T02:24:32Z 2022-05-17T02:24:32Z CONTRIBUTOR      

What happened?

In my codes I used the implicit cast from bool to int that xarray/numpy perform for certain operations. This is the case for sum. A resampling sum on a boolean array actually returns the number of True values and not the OR of all values.

However, when flox is activated, it does return the OR of all values. Digging a bit, I see that the flox aggregation uses np.add and not np.sum. So, this may in fact be an issue for flox? It felt the xarray devs should know about this potential regression anyway.

What did you expect to happen?

I expected a sum of boolean to actually be the count of True values.

Minimal Complete Verifiable Example

```Python import xarray as xr

ds = xr.tutorial.open_dataset("air_temperature")

Count the monthly number of 6-hour periods with tas over 300K

with xr.set_options(use_flox=False): # this works as expected outOLD = (ds.air > 300).resample(time='MS').sum()

with xr.set_options(use_flox=True): # this doesn't fail, but return True or False : # the OR and not the expected sum. outFLOX = (ds.air > 300).resample(time='MS').sum() ```

MVCE confirmation

  • [X] Minimal example — the example is as focused as reasonably possible to demonstrate the underlying issue in xarray.
  • [X] Complete example — the example is self-contained, including all data and the text of any traceback.
  • [X] Verifiable example — the example copy & pastes into an IPython prompt or Binder notebook, returning the result.
  • [X] New issue — a search of GitHub Issues suggests this is not a duplicate.

Relevant log output

No response

Anything else we need to know?

I wrote a quick test for basic operations and sum seems the only really problematic one. prod does return a different dtype, but the values are not impacted.

for op in ['any', 'all', 'count', 'sum', 'prod', 'mean', 'var', 'std', 'max', 'min']: with xr.set_options(use_flox=False): outO = getattr((ds.air > 300).resample(time='YS'), op)() with xr.set_options(use_flox=True): outF = getattr((ds.air > 300).resample(time='YS'), op)() print(op, outO.dtype, outF.dtype, outO.equals(outF))) returns any bool bool True all bool bool True count int64 int64 True sum int64 bool False prod int64 bool True mean float64 float64 True var float64 float64 True std float64 float64 True max bool bool True min bool bool True

Environment

INSTALLED VERSIONS ------------------ commit: None python: 3.9.6 | packaged by conda-forge | (default, Jul 11 2021, 03:39:48) [GCC 9.3.0] python-bits: 64 OS: Linux OS-release: 5.17.5-arch1-2 machine: x86_64 processor: byteorder: little LC_ALL: None LANG: fr_CA.utf8 LOCALE: ('fr_CA', 'UTF-8') libhdf5: 1.12.0 libnetcdf: 4.7.4 xarray: 2022.3.1.dev16+g3ead17ea pandas: 1.4.2 numpy: 1.21.6 scipy: 1.7.1 netCDF4: 1.5.7 pydap: None h5netcdf: 0.11.0 h5py: 3.4.0 Nio: None zarr: 2.10.0 cftime: 1.5.0 nc_time_axis: 1.3.1 PseudoNetCDF: None rasterio: None cfgrib: None iris: None bottleneck: 1.3.2 dask: 2022.04.1 distributed: 2022.4.1 matplotlib: 3.4.3 cartopy: None seaborn: None numbagg: None fsspec: 2021.07.0 cupy: None pint: 0.18 sparse: None flox: 0.5.1 numpy_groupies: 0.9.16 setuptools: 57.4.0 pip: 21.2.4 conda: None pytest: 6.2.5 IPython: 8.2.0 sphinx: 4.1.2
{
    "url": "https://api.github.com/repos/pydata/xarray/issues/6615/reactions",
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  completed 13221727 issue

Links from other tables

  • 2 rows from issues_id in issues_labels
  • 0 rows from issue in issue_comments
Powered by Datasette · Queries took 0.556ms · About: xarray-datasette