html_url,issue_url,id,node_id,user,created_at,updated_at,author_association,body,reactions,performed_via_github_app,issue https://github.com/pydata/xarray/issues/1375#issuecomment-326803603,https://api.github.com/repos/pydata/xarray/issues/1375,326803603,MDEyOklzc3VlQ29tbWVudDMyNjgwMzYwMw==,630936,2017-09-03T13:01:44Z,2017-09-03T13:01:44Z,CONTRIBUTOR,"> do you have an application that we could use to drive this? Other examples where labeled sparse arrays would be useful are, * [one-hot encoding](https://rasbt.github.io/mlxtend/user_guide/preprocessing/one-hot_encoding/) that are widely used in machine learning. * [tokenizing textual data](http://scikit-learn.org/stable/modules/feature_extraction.html#common-vectorizer-usage) produces large sparse matrices where the column labels correspond to the vocabulary, while row labels correspond to document ids. Here is a minimal example using scikit-learn, ```py import os.path from sklearn.datasets import fetch_20newsgroups from sklearn.feature_extraction.text import CountVectorizer ds = fetch_20newsgroups() vect = CountVectorizer() X = vect.fit_transform(ds.data) print(X) # Extracted tokens # Returns: # <11314x130107 sparse matrix of type '' # with 1787565 stored elements in Compressed Sparse Row format> column_labels = vect.get_feature_names() print(np.asarray(column_labels)) # Returns: # array(['00', '000', '0000', ..., 'íålittin', 'ñaustin', 'ýé'], dtype='