home / github / issues

Menu
  • Search all tables
  • GraphQL API

issues: 943112510

This data as json

id node_id number title user state locked assignee milestone comments created_at updated_at closed_at author_association active_lock_reason draft pull_request body reactions performed_via_github_app state_reason repo type
943112510 MDU6SXNzdWU5NDMxMTI1MTA= 5598 Conversion to pandas for zero-dimensional Data(Set|Array) 7611856 open 0     2 2021-07-13T10:07:38Z 2021-07-14T08:05:38Z   NONE      

What happened: Conversion to a pandas DataFrame of a zero dimensional DataArray or Dataset fails.

What you expected to happen: I would expect it to return a trivial DataFrame with one row and the respective coordinate / data set columns. However, I am not sure if that conflicts with potential other round trips between xarray and pandas - e.g. for one-dimensional 1-sized data arrays.

Minimal Complete Verifiable Example:

```python da = DataArray([1, 2, 3], dims=("x",), coords=dict(x=[1, 2, 3]))

I don't know of a way to construct such data array without the isel.

Essentially, below also works for higher dimensional data arrays and

results in a zero dimensional data array with all the coordinates of

the found minimum.

da = da.isel(**da.argmin(dim=("x",))) ds = Dataset({'a': da})

fails with ValueError: cannot convert a scalar to a DataFrame

from xarray/core/dataarray.py", line 2664, in to_dataframe

da.to_dataframe(name="foo")

Expected: a DataFrame with two columns (x and foo) and one row

fails with ValueError: no valid index for a 0-dimensional object

from xarray/core/coordinates.py", line 106, in to_index

ds.to_dataframe()

Expected: a DataFrame with two columns (x and a) and one row

```

Anything else we need to know?: I tested a little bit and got what I want with simply removing the

python def to_dataframe(...): ... if self.ndim == 0: raise ValueError("cannot convert a scalar to a DataFrame")

block from dataarray.py and changing

python def to_index(self, ordered_dims: Sequence[Hashable] = None) -> pd.Index: ... if len(ordered_dims) == 0: return pd.Index([0]) # raise ValueError("no valid index for a 0-dimensional object") to not raise and instead return a trivial index in coordinates.py.

I that would be considered reasonable behavior I am happy to contribute the respective unit test and changes!

Environment:

Output of <tt>xr.show_versions()</tt> INSTALLED VERSIONS ------------------ commit: None python: 3.9.5 (default, May 19 2021, 11:32:47) [GCC 10.2.0] python-bits: 64 OS: Linux OS-release: 5.8.0-59-generic machine: x86_64 processor: x86_64 byteorder: little LC_ALL: None LANG: en_US.UTF-8 LOCALE: en_US.UTF-8 libhdf5: 1.12.0 libnetcdf: 4.7.4 xarray: 0.17.0 pandas: 1.2.4 numpy: 1.20.2 scipy: 1.6.3 netCDF4: 1.5.6 pydap: None h5netcdf: None h5py: None Nio: None zarr: None cftime: 1.4.1 nc_time_axis: None PseudoNetCDF: None rasterio: None cfgrib: None iris: None bottleneck: None dask: None distributed: None matplotlib: 3.4.1 cartopy: None seaborn: None numbagg: None pint: None setuptools: 56.0.0 pip: 21.0.1 conda: None pytest: 6.2.3 IPython: 7.24.0 sphinx: None
{
    "url": "https://api.github.com/repos/pydata/xarray/issues/5598/reactions",
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
    13221727 issue

Links from other tables

  • 1 row from issues_id in issues_labels
  • 2 rows from issue in issue_comments
Powered by Datasette · Queries took 77.146ms · About: xarray-datasette