home / github / issues

Menu
  • GraphQL API
  • Search all tables

issues: 611879581

This data as json

id node_id number title user state locked assignee milestone comments created_at updated_at closed_at author_association active_lock_reason draft pull_request body reactions performed_via_github_app state_reason repo type
611879581 MDU6SXNzdWU2MTE4Nzk1ODE= 4027 Bug in the conversion of Pandas DataFrame into Xarray Dataset . 10154151 closed 0     6 2020-05-04T13:34:32Z 2020-05-07T13:50:07Z 2020-05-07T13:50:07Z NONE      

For an unknown reason, the DataSet coordinates don't appear to be in the same order as the Variable dimension when the DataSet is created from a multi-level DataFrame generated by the concatenation of two DataSeries.

In this case, the DataSet coordinates have not been sorted by ascending order at the creation of the DataSet (using the DataFrame.to_xarray method). Interestingly, this problem doesnt occur if the original Multi-level DataFrame is generated using the grouby() method.

A notebook presenting the issue can be downloaded [here] (https://github.com/lhoupert/xarraytest_lh)

MCVE Code Sample

python da1 = dfs1.to_xarray() print(da1)

Expected Output

<xarray.Dataset> Dimensions: (Staname: 60, Year: 15) Coordinates: * Staname (Staname) object '10G' '13G' '14G' '15G' '8G' ... 'Q1' 'R' 'S' 'T' * Year (Year) int64 1996 1997 1998 1999 2000 ... 2013 2014 2015 2016 2017 Data variables: U (Staname, Year) float64 nan nan nan nan ... 6.592e+04 6.592e+04 nan V (Staname, Year) float64 nan nan nan ... -6.592e+04 -6.592e+04 nan

Problem Description

The current output is: <xarray.Dataset> Dimensions: (Staname: 60, Year: 15) Coordinates: * Staname (Staname) object 'IB23S' 'IB22S' 'IB21S' ... '10G' '9G' '8G' * Year (Year) object 1996 1997 1998 1999 2000 ... 2013 2014 2015 2016 2017 Data variables: U (Staname, Year) float64 nan nan nan nan ... 6.592e+04 6.592e+04 nan V (Staname, Year) float64 nan nan nan ... -6.592e+04 -6.592e+04 nan

For an unknown reason, the DataSet created from the conversion of the DataFrame dfs1 is wrong.

For example, the data indexed as station IB23:

python print(da1.V.loc['IB23S',:]) <xarray.DataArray 'V' (Year: 15)> array([ nan, nan, nan, nan, -100910. , nan, nan, nan, -105910.1 , nan, nan, nan, -105910.15, -105910.16, -105910.17]) Coordinates: Staname <U5 'IB23S' * Year (Year) object 1996 1997 1998 1999 2000 ... 2013 2014 2015 2016 2017

... doesn't correspond to the original DataFrame data:

python print(dfs1.V.loc['IB23S',:]) Staname Year IB23S 2005 -65969.05 2006 -65969.06 2010 -60969.10 2011 -60969.11 2014 -60969.14 2015 -60969.15 2016 -60969.16 2017 -55969.17 Name: V, dtype: float64

But it appears to be the data corresponding to Station 10G in the original DataFrame

python dfs1.V.loc['10G',:] Staname Year 10G 2000 -100910.00 2010 -105910.10 2015 -105910.15 2016 -105910.16 2017 -105910.17 Name: V, dtype: float64

Notes

The problem appears to be in the DataSet coordinate Staname which has bot been sorted by ascending order while the Data Variable appear to have been sorted differently.

The original multi-level DataFrame has been generated by the concatenation of two DataSeries.

Interestingly, this problem doesnt occur if the original Multi-level DataFrame is generated using the grouby() method...

A notebook presenting the issue can be downloaded [here] (https://github.com/lhoupert/xarraytest_lh)

Versions

Output of <tt>xr.show_versions()</tt> INSTALLED VERSIONS ------------------ commit: None python: 3.8.2 | packaged by conda-forge | (default, Mar 23 2020, 17:55:48) [Clang 9.0.1 ] python-bits: 64 OS: Darwin OS-release: 18.7.0 machine: x86_64 processor: i386 byteorder: little LC_ALL: None LANG: en_GB.UTF-8 LOCALE: en_GB.UTF-8 libhdf5: 1.10.5 libnetcdf: 4.7.4 xarray: 0.15.1 pandas: 1.0.3 numpy: 1.18.1 scipy: 1.4.1 netCDF4: 1.5.3 pydap: None h5netcdf: None h5py: None Nio: None zarr: None cftime: 1.1.1.2 nc_time_axis: None PseudoNetCDF: None rasterio: None cfgrib: None iris: 2.4.0 bottleneck: None dask: 2.14.0 distributed: 2.14.0 matplotlib: 3.2.1 cartopy: 0.17.0 seaborn: 0.10.0 numbagg: None setuptools: 46.1.3.post20200325 pip: 20.0.2 conda: None pytest: 5.4.1 IPython: 7.13.0 sphinx: None
{
    "url": "https://api.github.com/repos/pydata/xarray/issues/4027/reactions",
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  completed 13221727 issue

Links from other tables

  • 1 row from issues_id in issues_labels
  • 6 rows from issue in issue_comments
Powered by Datasette · Queries took 0.751ms · About: xarray-datasette