html_url,issue_url,id,node_id,user,created_at,updated_at,author_association,body,reactions,performed_via_github_app,issue
https://github.com/pydata/xarray/issues/1375#issuecomment-526432439,https://api.github.com/repos/pydata/xarray/issues/1375,526432439,MDEyOklzc3VlQ29tbWVudDUyNjQzMjQzOQ==,2448579,2019-08-30T02:36:12Z,2019-08-30T02:36:12Z,MEMBER, @fjanoos there isn't any formal documentation yet but you can look at test_sparse.py for examples. That file will also tell you what works and doesn't work currently. ,"{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,221858543
https://github.com/pydata/xarray/issues/1375#issuecomment-526356476,https://api.github.com/repos/pydata/xarray/issues/1375,526356476,MDEyOklzc3VlQ29tbWVudDUyNjM1NjQ3Ng==,923438,2019-08-29T20:52:10Z,2019-08-29T20:52:10Z,NONE,"@shoyer
Is there documentation for using sparse arrays ? Could you point me to some example code ?
","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,221858543
https://github.com/pydata/xarray/issues/1375#issuecomment-520675205,https://api.github.com/repos/pydata/xarray/issues/1375,520675205,MDEyOklzc3VlQ29tbWVudDUyMDY3NTIwNQ==,1217238,2019-08-13T03:31:14Z,2019-08-13T03:31:14Z,MEMBER,"This is working now on the `master` branch!
Once we get a few more kinks worked out, it will be in the next release.
I've started another issue for discussing how xarray could integrate sparse arrays better into its API: https://github.com/pydata/xarray/issues/3213","{""total_count"": 4, ""+1"": 4, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,221858543
https://github.com/pydata/xarray/issues/1375#issuecomment-513589352,https://api.github.com/repos/pydata/xarray/issues/1375,513589352,MDEyOklzc3VlQ29tbWVudDUxMzU4OTM1Mg==,923438,2019-07-21T21:32:23Z,2019-07-21T21:32:23Z,NONE,Wondering what the status on this is ? Is there a branch with this functionality implemented - would love to give it a spin !,"{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,221858543
https://github.com/pydata/xarray/issues/1375#issuecomment-511209094,https://api.github.com/repos/pydata/xarray/issues/1375,511209094,MDEyOklzc3VlQ29tbWVudDUxMTIwOTA5NA==,306380,2019-07-14T14:50:45Z,2019-07-14T14:50:45Z,MEMBER,@nvictus has been working on this at #3117 ,"{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,221858543
https://github.com/pydata/xarray/issues/1375#issuecomment-511127437,https://api.github.com/repos/pydata/xarray/issues/1375,511127437,MDEyOklzc3VlQ29tbWVudDUxMTEyNzQzNw==,1197350,2019-07-13T14:45:17Z,2019-07-13T14:45:17Z,MEMBER,"I personally use the new sparse project for my day-to-day research. I am motivated on this, but I probably won't have time today to dive deep on this.
Maybe CuPy would be more exciting.","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,221858543
https://github.com/pydata/xarray/issues/1375#issuecomment-511121578,https://api.github.com/repos/pydata/xarray/issues/1375,511121578,MDEyOklzc3VlQ29tbWVudDUxMTEyMTU3OA==,98330,2019-07-13T13:18:34Z,2019-07-13T13:18:34Z,NONE,"I haven't talked to anyone at SciPy'19 yet who was interested in sparse arrays, but I'll keep an eye out today.
And yes, this is a fun issue to work on and would be really nice to have!","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,221858543
https://github.com/pydata/xarray/issues/1375#issuecomment-510943157,https://api.github.com/repos/pydata/xarray/issues/1375,510943157,MDEyOklzc3VlQ29tbWVudDUxMDk0MzE1Nw==,306380,2019-07-12T16:07:42Z,2019-07-12T16:07:42Z,MEMBER,@rgommers might be able to recommend someone,"{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,221858543
https://github.com/pydata/xarray/issues/1375#issuecomment-510940851,https://api.github.com/repos/pydata/xarray/issues/1375,510940851,MDEyOklzc3VlQ29tbWVudDUxMDk0MDg1MQ==,1197350,2019-07-12T16:00:23Z,2019-07-12T16:00:23Z,MEMBER,"If someone who is good at numpy shows up at our sprint tomorrow, this could be a good issue try out.","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,221858543
https://github.com/pydata/xarray/issues/1375#issuecomment-504777412,https://api.github.com/repos/pydata/xarray/issues/1375,504777412,MDEyOklzc3VlQ29tbWVudDUwNDc3NzQxMg==,1217238,2019-06-23T18:54:33Z,2019-06-23T18:54:33Z,MEMBER,"It will need some experimentation, but I think things should be pretty close after NumPy 1.17 is released. Potentially it could be as easy as adjusting the rules xarray uses for casting in `xarray.core.variable.as_compatible_data`.","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,221858543
https://github.com/pydata/xarray/issues/1375#issuecomment-504620907,https://api.github.com/repos/pydata/xarray/issues/1375,504620907,MDEyOklzc3VlQ29tbWVudDUwNDYyMDkwNw==,1197350,2019-06-22T02:55:17Z,2019-06-22T02:55:17Z,MEMBER,"Given the recent improvements in numpy duck array typing, how close are we to being able to just wrap a pydata/sparse array in an xarray Dataset?","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,221858543
https://github.com/pydata/xarray/issues/1375#issuecomment-403155235,https://api.github.com/repos/pydata/xarray/issues/1375,403155235,MDEyOklzc3VlQ29tbWVudDQwMzE1NTIzNQ==,1217238,2018-07-06T21:49:27Z,2018-07-06T21:49:27Z,MEMBER,"> Would it be an option to use dask's sparse support?
http://dask.pydata.org/en/latest/array-sparse.html
This way xarray could let dask do the main work.
In principle this would work, though I would prefer to support it directly in xarray, too.
> I know that NetCDF4 has some conventions how to store sparse data, but do we have to implement our own conversion mechanisms for each sparse type?
Yes, we would need to implement a convention for handling sparse array data.","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,221858543
https://github.com/pydata/xarray/issues/1375#issuecomment-402699810,https://api.github.com/repos/pydata/xarray/issues/1375,402699810,MDEyOklzc3VlQ29tbWVudDQwMjY5OTgxMA==,1200058,2018-07-05T12:02:30Z,2018-07-05T12:02:30Z,NONE,"How should these sparse arrays get stored in NetCDF4?
I know that NetCDF4 has some conventions how to store sparse data, but do we have to implement our own conversion mechanisms for each sparse type?","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,221858543
https://github.com/pydata/xarray/issues/1375#issuecomment-402699290,https://api.github.com/repos/pydata/xarray/issues/1375,402699290,MDEyOklzc3VlQ29tbWVudDQwMjY5OTI5MA==,1200058,2018-07-05T12:00:15Z,2018-07-05T12:00:15Z,NONE,"Would it be an option to use dask's sparse support?
http://dask.pydata.org/en/latest/array-sparse.html
This way xarray could let dask do the main work.
Currently I load everything into a dask array by hand and pass this dask array to xarray.
This works pretty good.","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,221858543
https://github.com/pydata/xarray/issues/1375#issuecomment-395223735,https://api.github.com/repos/pydata/xarray/issues/1375,395223735,MDEyOklzc3VlQ29tbWVudDM5NTIyMzczNQ==,1217238,2018-06-06T21:43:40Z,2018-06-06T21:43:40Z,MEMBER,"See also: https://github.com/pydata/xarray/issues/1938
The major challenge now is the dispatching mechanism, which hopefully http://www.numpy.org/neps/nep-0018-array-function-protocol.html will solve.","{""total_count"": 1, ""+1"": 1, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,221858543
https://github.com/pydata/xarray/issues/1375#issuecomment-395009307,https://api.github.com/repos/pydata/xarray/issues/1375,395009307,MDEyOklzc3VlQ29tbWVudDM5NTAwOTMwNw==,1200058,2018-06-06T09:39:43Z,2018-06-06T09:41:28Z,NONE,"I'd know a project which could make perfect use of xarray, if it would support sparse tensors:
https://github.com/theislab/anndata
Currently I have to work with both xarray and anndata to store counts in sparse arrays separate from other depending data which is a little bit annoying :)","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,221858543
https://github.com/pydata/xarray/issues/1375#issuecomment-355383374,https://api.github.com/repos/pydata/xarray/issues/1375,355383374,MDEyOklzc3VlQ29tbWVudDM1NTM4MzM3NA==,4998171,2018-01-04T19:59:28Z,2018-01-04T19:59:28Z,NONE,"I'm interested to see if there have been any developments on this. I currently have an application where I'm working with multiple dask arrays, some of which are sparse (text data). It'd be worth my time to move my project to xarray, so I'm be interested in contributing something here if there is a need.","{""total_count"": 3, ""+1"": 3, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,221858543
https://github.com/pydata/xarray/issues/1375#issuecomment-326824818,https://api.github.com/repos/pydata/xarray/issues/1375,326824818,MDEyOklzc3VlQ29tbWVudDMyNjgyNDgxOA==,1197350,2017-09-03T19:07:54Z,2017-09-03T19:07:54Z,MEMBER,Sparse Xarray DataArrays would be useful for the linear regridding operations discussed in JiaweiZhuang/xESMF#3.,"{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,221858543
https://github.com/pydata/xarray/issues/1375#issuecomment-326803603,https://api.github.com/repos/pydata/xarray/issues/1375,326803603,MDEyOklzc3VlQ29tbWVudDMyNjgwMzYwMw==,630936,2017-09-03T13:01:44Z,2017-09-03T13:01:44Z,CONTRIBUTOR,"> do you have an application that we could use to drive this?
Other examples where labeled sparse arrays would be useful are,
* [one-hot encoding](https://rasbt.github.io/mlxtend/user_guide/preprocessing/one-hot_encoding/) that are widely used in machine learning.
* [tokenizing textual data](http://scikit-learn.org/stable/modules/feature_extraction.html#common-vectorizer-usage) produces large sparse matrices where the column labels correspond to the vocabulary, while row labels correspond to document ids. Here is a minimal example using scikit-learn,
```py
import os.path
from sklearn.datasets import fetch_20newsgroups
from sklearn.feature_extraction.text import CountVectorizer
ds = fetch_20newsgroups()
vect = CountVectorizer()
X = vect.fit_transform(ds.data)
print(X) # Extracted tokens
# Returns:
# <11314x130107 sparse matrix of type ''
# with 1787565 stored elements in Compressed Sparse Row format>
column_labels = vect.get_feature_names()
print(np.asarray(column_labels))
# Returns:
# array(['00', '000', '0000', ..., 'íålittin', 'ñaustin', 'ýé'], dtype=' @rabernat do you have an application that we could use to drive this?
Nothing comes to mind immediately. My data are unfortunately quite dense! 😜 ","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,221858543
https://github.com/pydata/xarray/issues/1375#issuecomment-294381137,https://api.github.com/repos/pydata/xarray/issues/1375,294381137,MDEyOklzc3VlQ29tbWVudDI5NDM4MTEzNw==,306380,2017-04-16T23:50:26Z,2017-04-16T23:50:26Z,MEMBER,"Here is a brief attempt at a multi-dimensional sparse array: https://github.com/mrocklin/sparse
It depends on numpy and scipy.sparse and, with the exception of a bit of in-memory data movement and copies, should run at scipy speeds (though I haven't done any benchmarking).
@rabernat do you have an application that we could use to drive this?","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,221858543
https://github.com/pydata/xarray/issues/1375#issuecomment-294283051,https://api.github.com/repos/pydata/xarray/issues/1375,294283051,MDEyOklzc3VlQ29tbWVudDI5NDI4MzA1MQ==,4160723,2017-04-15T09:42:20Z,2017-04-15T09:42:20Z,MEMBER,"Although I don't know much about [SciDB](http://scidb-py.readthedocs.io/en/stable/), it seems to be another possible application for `xarray.register_data_type`.","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,221858543
https://github.com/pydata/xarray/issues/1375#issuecomment-294270200,https://api.github.com/repos/pydata/xarray/issues/1375,294270200,MDEyOklzc3VlQ29tbWVudDI5NDI3MDIwMA==,1197350,2017-04-15T03:56:27Z,2017-04-15T03:56:52Z,MEMBER,"👍 to the scipy.sparse array suggestion
[While we are discussing supporting other array types, we should keep [gpu arrays](https://documen.tician.de/pycuda/array.html) on the radar]
","{""total_count"": 2, ""+1"": 2, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,221858543
https://github.com/pydata/xarray/issues/1375#issuecomment-294250748,https://api.github.com/repos/pydata/xarray/issues/1375,294250748,MDEyOklzc3VlQ29tbWVudDI5NDI1MDc0OA==,1217238,2017-04-14T22:46:10Z,2017-04-14T22:47:01Z,MEMBER,"Yes, I would say this is in scope, as long as we can keep most of the data-type specific logic out of xarray's core (which seems doable).
Currently, we define most of our operations on duck arrays in https://github.com/pydata/xarray/blob/master/xarray/core/duck_array_ops.py
There are a few other hacks throughout the codebase, which can find by searching for ""dask_array_type"": https://github.com/pydata/xarray/search?p=1&q=dask_array_type&type=&utf8=%E2%9C%93
It's pretty crude, but basically this would need to be extended to implement many of these methods on for sparse arrays, too. Ideally we would define xarray's adapter logic into more cleanly separated submodules, perhaps using multiple dispatch. Even better, we would make this public API, so you can write something like `xarray.register_data_type(MySparseArray)` to register a type as valid for xarray's `.data` attribute.
It looks like `__array_ufunc__` will actually finally land in NumPy 1.13, which might make this easier.
See also https://github.com/pydata/xarray/pull/1118","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,221858543