issues: 118156114
This data as json
| id | node_id | number | title | user | state | locked | assignee | milestone | comments | created_at | updated_at | closed_at | author_association | active_lock_reason | draft | pull_request | body | reactions | performed_via_github_app | state_reason | repo | type |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 118156114 | MDU6SXNzdWUxMTgxNTYxMTQ= | 664 | Align pandas objects added to Datasets? | 5635139 | closed | 0 | 3 | 2015-11-21T00:37:47Z | 2015-12-08T07:23:41Z | 2015-12-08T07:23:41Z | MEMBER | We have a pandas DataFrame which is not aligned on an xray Dataset: ``` python In [34]: da = xray.DataArray( np.random.rand(5,2), coords=( ('date', pd.date_range(start='2000', periods=5)), ('company', list('ab')), ) ) da Out[34]: <xray.DataArray (date: 5, company: 2)> array([[ 0.82168647, 0.93097023], [ 0.34928855, 0.23245631], [ 0.32857461, 0.12554705], [ 0.44983381, 0.27182767], [ 0.31063147, 0.52894834]]) Coordinates: * date (date) datetime64[ns] 2000-01-01 2000-01-02 2000-01-03 ... * company (company) |S1 'a' 'b' In [35]: ds = xray.Dataset({'returns': da}) ds Out[35]: <xray.Dataset> Dimensions: (company: 2, date: 5) Coordinates: * date (date) datetime64[ns] 2000-01-01 2000-01-02 2000-01-03 ... * company (company) |S1 'a' 'b' Data variables: returns (date, company) float64 0.8217 0.931 0.3493 0.2325 0.3286 ... In [36]: df=da.to_pandas()
df
Out[36]:
company a b
date rank
rank = df.rank()
rank
Out[41]:
company a b
date rank=rank.reindex(columns=list('ba'))
rank
Out[42]:
company b a
date When we add it to a Dataset, it ignores the index on the columns: ``` python In [49]: ds['rank'] = (('date','company'),rank)
ds['rank'].to_pandas()
Out[49]:
company a b
date And adding the DataFrame without supplying dims doesn't work. One solution, is to construct a DataArray out of the pandas object: ``` python In [45]: ds['rank'] = xray.DataArray(rank) ds Out[45]: <xray.Dataset> Dimensions: (company: 2, date: 5) Coordinates: * date (date) datetime64[ns] 2000-01-01 2000-01-02 2000-01-03 ... * company (company) object 'a' 'b' Data variables: returns (date, company) float64 0.8217 0.931 0.3493 0.2325 0.3286 ... rank (date, company) float64 5.0 5.0 3.0 2.0 2.0 1.0 4.0 3.0 1.0 4.0 ``` Possible additions to make this easier: - Align pandas objects that are passed in, when dims are supplied - Allow adding pandas objects to Datasets with labelled axes without supplying dims, and align those (similar to wrapping them in a DataArray constructor) What are your thoughts? |
{
"url": "https://api.github.com/repos/pydata/xarray/issues/664/reactions",
"total_count": 0,
"+1": 0,
"-1": 0,
"laugh": 0,
"hooray": 0,
"confused": 0,
"heart": 0,
"rocket": 0,
"eyes": 0
} |
completed | 13221727 | issue |