issues: 216215022
This data as json
id | node_id | number | title | user | state | locked | assignee | milestone | comments | created_at | updated_at | closed_at | author_association | active_lock_reason | draft | pull_request | body | reactions | performed_via_github_app | state_reason | repo | type |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
216215022 | MDU6SXNzdWUyMTYyMTUwMjI= | 1317 | API for reshaping DataArrays as 2D "data matrices" for use in machine learning | 1386642 | closed | 0 | 9 | 2017-03-22T21:33:07Z | 2019-07-05T00:32:51Z | 2019-07-05T00:32:51Z | CONTRIBUTOR | Machine learning and linear algebra problems are often expressed in terms of operations on matrices rather than arrays of arbitrary dimension, and there is currently no convenient way to turn DataArrays (or combinations of DataArrays) into a single "data matrix". As an example, I have needed to use scikit-learn lately with data from DataArray objects. Scikit-learn requires the data to be expressed in terms of simple 2-dimensional matrices. The rows are called samples, and the columns are known as features. It is annoying and error to transpose and reshape a data array by hand to fit into this format. For instance, this gituhub repo for xarray aware sklearn-like objects devotes many lines of code to massaging data arrays into data matrices. I think that this reshaping workflow might be common enough to warrant some kind of treatment in xarray. I have written some code in this gist, that have found pretty convenient for doing this. This gist has an rs = XRReshaper(A) data_matrix, _ = rs.to(feature_dims) Some linear algebra or machine learning,, eofs = svd(data_matrix) eofs_datarray = rs.get(eofs[0], ['mode'] + feature_dims) ``` I am not sure this is the best API, but it seems to work pretty well and I have used it here to implement some xarray-aware sklearn-like objects for PCA, which can be used like
Another syntax which might be helpful is some kind of context manager approach like ```python with XRReshaper(A) as rs, data_matrix: # do some stuff with data_matrix use rs to restore output to a data array.``` |
{ "url": "https://api.github.com/repos/pydata/xarray/issues/1317/reactions", "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
completed | 13221727 | issue |