home / github / issues

Menu
  • Search all tables
  • GraphQL API

issues: 1035607476

This data as json

id node_id number title user state locked assignee milestone comments created_at updated_at closed_at author_association active_lock_reason draft pull_request body reactions performed_via_github_app state_reason repo type
1035607476 I_kwDOAMm_X849uh20 5897 ds.mean bugs with cftime objects 20629530 open 0     1 2021-10-25T21:55:12Z 2021-10-27T14:51:07Z   CONTRIBUTOR      

What happened: Given a dataset that has a variable with cftime objects along dimension A, averaging (mean) leads to buggy behaviour:

  1. Averaging over 'A' drops the variable instead of averaging it.
  2. Averaging over any other dimension will fail if that variable is on the dask backend.

What you expected to happen:

  1. I expected the average to fail in the case of a dask-backed cftime variable, given that this code exists: https://github.com/pydata/xarray/blob/fdabf3bea5c750939a4a2ae60f80ed34a6aebd58/xarray/core/duck_array_ops.py#L562-L572

And I expected the average to work (not drop the var) in the case of the numpy backend.

  1. I expected the fact that dask is used to be irrelevant to the result. I expected the mean to conserve the cftime variable as-is since it doesn't include the averaged dimension.

Minimal Complete Verifiable Example:

```python

Put your MCVE code here

import xarray as xr

ds = xr.Dataset({ 'var1': (('time',), xr.cftime_range('2021-10-31', periods=10, freq='D')), 'var2': (('x',), list(range(10))) })

var1 contains cftime objects

var2 contains integers

They do not share dims

ds.mean('time') # var1 has disappeared instead of being averaged

ds.mean('x') # Everything ok

dsc = ds.chunk({})

dsc.mean('time') # var1 has disappeared. I would expected this line to fail.

dsc.mean('x') # Raises NotImplementedError. I would expect this line to run flawlessly. ```

Anything else we need to know?: A culprit is #5393, but maybe the bug is older? I think the change introduced there causes the issue (2) above.

In duck_array_ops.py the mean operation is declared numeric_only, which is kinda incoherent with the implementation allowing means of datetime objects. This setting causes my (1) above.

Environment:

Output of <tt>xr.show_versions()</tt> INSTALLED VERSIONS ------------------ commit: fdabf3bea5c750939a4a2ae60f80ed34a6aebd58 python: 3.9.7 | packaged by conda-forge | (default, Sep 29 2021, 19:20:46) [GCC 9.4.0] python-bits: 64 OS: Linux OS-release: 5.14.12-arch1-1 machine: x86_64 processor: byteorder: little LC_ALL: None LANG: fr_CA.utf8 LOCALE: ('fr_CA', 'UTF-8') libhdf5: 1.12.1 libnetcdf: 4.8.1 xarray: 0.19.1.dev89+gfdabf3be pandas: 1.3.4 numpy: 1.21.3 scipy: 1.7.1 netCDF4: 1.5.7 pydap: installed h5netcdf: 0.11.0 h5py: 3.4.0 Nio: None zarr: 2.10.1 cftime: 1.5.1 nc_time_axis: 1.4.0 PseudoNetCDF: installed rasterio: 1.2.10 cfgrib: 0.9.9.1 iris: 3.1.0 bottleneck: 1.3.2 dask: 2021.10.0 distributed: 2021.10.0 matplotlib: 3.4.3 cartopy: 0.20.1 seaborn: 0.11.2 numbagg: 0.2.1 fsspec: 2021.10.1 cupy: None pint: 0.17 sparse: 0.13.0 setuptools: 58.2.0 pip: 21.3.1 conda: None pytest: 6.2.5 IPython: 7.28.0 sphinx: None
{
    "url": "https://api.github.com/repos/pydata/xarray/issues/5897/reactions",
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
    13221727 issue

Links from other tables

  • 2 rows from issues_id in issues_labels
  • 1 row from issue in issue_comments
Powered by Datasette · Queries took 239.457ms · About: xarray-datasette