home / github / issues

Menu
  • GraphQL API
  • Search all tables

issues: 216215022

This data as json

id node_id number title user state locked assignee milestone comments created_at updated_at closed_at author_association active_lock_reason draft pull_request body reactions performed_via_github_app state_reason repo type
216215022 MDU6SXNzdWUyMTYyMTUwMjI= 1317 API for reshaping DataArrays as 2D "data matrices" for use in machine learning 1386642 closed 0     9 2017-03-22T21:33:07Z 2019-07-05T00:32:51Z 2019-07-05T00:32:51Z CONTRIBUTOR      

Machine learning and linear algebra problems are often expressed in terms of operations on matrices rather than arrays of arbitrary dimension, and there is currently no convenient way to turn DataArrays (or combinations of DataArrays) into a single "data matrix".

As an example, I have needed to use scikit-learn lately with data from DataArray objects. Scikit-learn requires the data to be expressed in terms of simple 2-dimensional matrices. The rows are called samples, and the columns are known as features. It is annoying and error to transpose and reshape a data array by hand to fit into this format. For instance, this gituhub repo for xarray aware sklearn-like objects devotes many lines of code to massaging data arrays into data matrices. I think that this reshaping workflow might be common enough to warrant some kind of treatment in xarray.

I have written some code in this gist, that have found pretty convenient for doing this. This gist has an XRReshaper class which can be used for reshaping data to and from a matrix format. The basic usage for an EOF analysis of a dataset A(lat, lon, time) can be done like this ```python feature_dims = ['lat', 'lon']

rs = XRReshaper(A) data_matrix, _ = rs.to(feature_dims)

Some linear algebra or machine learning

,, eofs = svd(data_matrix)

eofs_datarray = rs.get(eofs[0], ['mode'] + feature_dims) ```

I am not sure this is the best API, but it seems to work pretty well and I have used it here to implement some xarray-aware sklearn-like objects for PCA, which can be used like feature_dims = ['lat', 'lon'] pca = XPCA(feature_dims, n_components=10, weight=cos(A.lat)) pca.fit(A) pca.transform(A) eofs = pca.components_

Another syntax which might be helpful is some kind of context manager approach like ```python with XRReshaper(A) as rs, data_matrix: # do some stuff with data_matrix

use rs to restore output to a data array.

```

{
    "url": "https://api.github.com/repos/pydata/xarray/issues/1317/reactions",
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  completed 13221727 issue

Links from other tables

  • 0 rows from issues_id in issues_labels
  • 9 rows from issue in issue_comments
Powered by Datasette · Queries took 0.626ms · About: xarray-datasette