home / github / issues

Menu
  • GraphQL API
  • Search all tables

issues: 118156114

This data as json

id node_id number title user state locked assignee milestone comments created_at updated_at closed_at author_association active_lock_reason draft pull_request body reactions performed_via_github_app state_reason repo type
118156114 MDU6SXNzdWUxMTgxNTYxMTQ= 664 Align pandas objects added to Datasets? 5635139 closed 0     3 2015-11-21T00:37:47Z 2015-12-08T07:23:41Z 2015-12-08T07:23:41Z MEMBER      

We have a pandas DataFrame which is not aligned on an xray Dataset:

``` python In [34]:

da = xray.DataArray( np.random.rand(5,2), coords=( ('date', pd.date_range(start='2000', periods=5)), ('company', list('ab')), ) ) da Out[34]: <xray.DataArray (date: 5, company: 2)> array([[ 0.82168647, 0.93097023], [ 0.34928855, 0.23245631], [ 0.32857461, 0.12554705], [ 0.44983381, 0.27182767], [ 0.31063147, 0.52894834]]) Coordinates: * date (date) datetime64[ns] 2000-01-01 2000-01-02 2000-01-03 ... * company (company) |S1 'a' 'b' In [35]:

ds = xray.Dataset({'returns': da}) ds Out[35]: <xray.Dataset> Dimensions: (company: 2, date: 5) Coordinates: * date (date) datetime64[ns] 2000-01-01 2000-01-02 2000-01-03 ... * company (company) |S1 'a' 'b' Data variables: returns (date, company) float64 0.8217 0.931 0.3493 0.2325 0.3286 ... In [36]:

df=da.to_pandas() df Out[36]: company a b date
2000-01-01 0.821686 0.930970 2000-01-02 0.349289 0.232456 2000-01-03 0.328575 0.125547 2000-01-04 0.449834 0.271828 2000-01-05 0.310631 0.528948 In [41]:

rank rank = df.rank() rank Out[41]: company a b date
2000-01-01 5 5 2000-01-02 3 2 2000-01-03 2 1 2000-01-04 4 3 2000-01-05 1 4 In [42]:

rank=rank.reindex(columns=list('ba')) rank Out[42]: company b a date
2000-01-01 5 5 2000-01-02 2 3 2000-01-03 1 2 2000-01-04 3 4 2000-01-05 4 1 ```

When we add it to a Dataset, it ignores the index on the columns:

``` python In [49]:

ds['rank'] = (('date','company'),rank) ds['rank'].to_pandas() Out[49]: company a b date
2000-01-01 5 5 2000-01-02 2 3 2000-01-03 1 2 2000-01-04 3 4 2000-01-05 4 1 ```

And adding the DataFrame without supplying dims doesn't work. One solution, is to construct a DataArray out of the pandas object:

``` python In [45]:

ds['rank'] = xray.DataArray(rank) ds Out[45]: <xray.Dataset> Dimensions: (company: 2, date: 5) Coordinates: * date (date) datetime64[ns] 2000-01-01 2000-01-02 2000-01-03 ... * company (company) object 'a' 'b' Data variables: returns (date, company) float64 0.8217 0.931 0.3493 0.2325 0.3286 ... rank (date, company) float64 5.0 5.0 3.0 2.0 2.0 1.0 4.0 3.0 1.0 4.0 ```

Possible additions to make this easier: - Align pandas objects that are passed in, when dims are supplied - Allow adding pandas objects to Datasets with labelled axes without supplying dims, and align those (similar to wrapping them in a DataArray constructor)

What are your thoughts?

{
    "url": "https://api.github.com/repos/pydata/xarray/issues/664/reactions",
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  completed 13221727 issue

Links from other tables

  • 0 rows from issues_id in issues_labels
  • 3 rows from issue in issue_comments
Powered by Datasette · Queries took 0.78ms · About: xarray-datasette