home / github / issues

Menu
  • GraphQL API
  • Search all tables

issues: 830040696

This data as json

id node_id number title user state locked assignee milestone comments created_at updated_at closed_at author_association active_lock_reason draft pull_request body reactions performed_via_github_app state_reason repo type
830040696 MDU6SXNzdWU4MzAwNDA2OTY= 5024 xr.DataArray.sum() converts string objects into unicode 19226431 open 0     0 2021-03-12T11:47:06Z 2022-04-09T01:40:09Z   CONTRIBUTOR      

What happened:

When summing over all axes of a DataArray with strings of dtype object, the result is a one-size unicode DataArray.

What you expected to happen:

I expected the summation would preserve the dtype, meaning the one-size DataArray would be of dtype object

Minimal Complete Verifiable Example:

ds = xr.DataArray('a', [range(3), range(3)]).astype(object) ds.sum()

Output <xarray.DataArray ()> array('aaaaaaaaa', dtype='<U9')

On the other hand, when summing over one dimension only, the dtype is preserved ds.sum('dim_0')

Output: <xarray.DataArray (dim_1: 3)> array(['aaa', 'aaa', 'aaa'], dtype=object) Coordinates: * dim_1 (dim_1) int64 0 1 2

Anything else we need to know?:

The problem becomes relevant as soon as dask is used in the workflow. Dask expects the aggregated DataArray to be of dtype object which will likely lead to errors in the operations to follow.

Probably the behavior comes from creating a new DataArray after the reduction with np.sum() (which itself leads results in a pure python string).

Environment:

Output of <tt>xr.show_versions()</tt> INSTALLED VERSIONS ------------------ commit: None python: 3.8.5 (default, Sep 4 2020, 07:30:14) [GCC 7.3.0] python-bits: 64 OS: Linux OS-release: 5.4.0-66-generic machine: x86_64 processor: x86_64 byteorder: little LC_ALL: None LANG: en_US.UTF-8 LOCALE: en_US.UTF-8 libhdf5: 1.10.6 libnetcdf: 4.7.4 xarray: 0.16.2 pandas: 1.2.1 numpy: 1.19.5 scipy: 1.6.0 netCDF4: 1.5.5.1 pydap: None h5netcdf: 0.7.4 h5py: 3.1.0 Nio: None zarr: 2.3.2 cftime: 1.3.1 nc_time_axis: None PseudoNetCDF: None rasterio: 1.2.0 cfgrib: None iris: None bottleneck: 1.3.2 dask: 2021.01.1 distributed: 2021.01.1 matplotlib: 3.3.3 cartopy: 0.18.0 seaborn: 0.11.1 numbagg: None pint: None setuptools: 52.0.0.post20210125 pip: 21.0 conda: 4.9.2 pytest: 6.2.2 IPython: 7.19.0 sphinx: 3.4.3
{
    "url": "https://api.github.com/repos/pydata/xarray/issues/5024/reactions",
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
    13221727 issue

Links from other tables

  • 1 row from issues_id in issues_labels
  • 0 rows from issue in issue_comments
Powered by Datasette · Queries took 0.807ms · About: xarray-datasette