html_url,issue_url,id,node_id,user,created_at,updated_at,author_association,body,reactions,performed_via_github_app,issue
https://github.com/pydata/xarray/issues/1375#issuecomment-326803603,https://api.github.com/repos/pydata/xarray/issues/1375,326803603,MDEyOklzc3VlQ29tbWVudDMyNjgwMzYwMw==,630936,2017-09-03T13:01:44Z,2017-09-03T13:01:44Z,CONTRIBUTOR,"> do you have an application that we could use to drive this?
Other examples where labeled sparse arrays would be useful are,
* [one-hot encoding](https://rasbt.github.io/mlxtend/user_guide/preprocessing/one-hot_encoding/) that are widely used in machine learning.
* [tokenizing textual data](http://scikit-learn.org/stable/modules/feature_extraction.html#common-vectorizer-usage) produces large sparse matrices where the column labels correspond to the vocabulary, while row labels correspond to document ids. Here is a minimal example using scikit-learn,
```py
import os.path
from sklearn.datasets import fetch_20newsgroups
from sklearn.feature_extraction.text import CountVectorizer
ds = fetch_20newsgroups()
vect = CountVectorizer()
X = vect.fit_transform(ds.data)
print(X) # Extracted tokens
# Returns:
# <11314x130107 sparse matrix of type ''
# with 1787565 stored elements in Compressed Sparse Row format>
column_labels = vect.get_feature_names()
print(np.asarray(column_labels))
# Returns:
# array(['00', '000', '0000', ..., 'íålittin', 'ñaustin', 'ýé'], dtype='