html_url,issue_url,id,node_id,user,created_at,updated_at,author_association,body,reactions,performed_via_github_app,issue
https://github.com/pydata/xarray/issues/3484#issuecomment-549627590,https://api.github.com/repos/pydata/xarray/issues/3484,549627590,MDEyOklzc3VlQ29tbWVudDU0OTYyNzU5MA==,10554254,2019-11-05T01:50:29Z,2020-02-12T02:51:51Z,NONE,"After reading through the issue tracker and PRs, it looks like sparse arrays can safely be wrapped with xarray, thanks to the work done in [PR#3117](https://github.com/pydata/xarray/pull/3117), but built-in functions are still under development (e.g. [PR#3542](https://github.com/pydata/xarray/pull/3542)). As a user, here is what I am seeing when test driving sparse:
Sparse gives me a smaller in-memory array
```python
In [1]: import xarray as xr, sparse, sys, numpy as np, dask.array as da
In [2]: x = np.random.random((100, 100, 100))
In [3]: x[x < 0.9] = np.nan
In [4]: s = sparse.COO.from_numpy(x, fill_value=np.nan)
In [5]: sys.getsizeof(s)
Out[5]: 3189592
In [6]: sys.getsizeof(x)
Out[6]: 8000128
```
Which I can wrap with dask and xarray
```python
In [7]: x = da.from_array(x)
In [8]: s = da.from_array(s)
In [9]: ds_dense = xr.DataArray(x).to_dataset(name='data_variable')
In [10]: ds_sparse = xr.DataArray(s).to_dataset(name='data_variable')
In [11]: ds_dense
Out[11]:
Dimensions: (dim_0: 100, dim_1: 100, dim_2: 100)
Dimensions without coordinates: dim_0, dim_1, dim_2
Data variables:
data_variable (dim_0, dim_1, dim_2) float64 dask.array
In [12]: ds_sparse
Out[12]:
Dimensions: (dim_0: 100, dim_1: 100, dim_2: 100)
Dimensions without coordinates: dim_0, dim_1, dim_2
Data variables:
data_variable (dim_0, dim_1, dim_2) float64 dask.array
```
However, computation on a sparse array takes longer than running compute on a dense array (which I think is expected...?)
```python
In [13]: %%time
...: ds_sparse.mean().compute()
CPU times: user 487 ms, sys: 22.9 ms, total: 510 ms
Wall time: 518 ms
Out[13]:
Dimensions: ()
Data variables:
data_variable float64 0.9501
In [14]: %%time
...: ds_dense.mean().compute()
CPU times: user 10.9 ms, sys: 3.91 ms, total: 14.8 ms
Wall time: 13.8 ms
Out[14]:
Dimensions: ()
Data variables:
data_variable float64 0.9501
```
And writing to netcdf, to take advantage of the smaller data size, doesn't work out of the box (yet)
```python
In [15]: ds_sparse.to_netcdf('ds_sparse.nc')
Out[15]: ...
RuntimeError: Cannot convert a sparse array to dense automatically. To manually densify, use the todense method.
```
Additional discussion happening at #3213
@dcherian @shoyer Am I missing any built-in methods that are working and ready for public release? Happy to send in a PR, if any of what is provided here should go into a basic example for the docs.
At this stage, I am not using sparse arrays for my own research just yet, but when I get to that anticipated phase I can dig in more on this and hopefully send in some useful PRs for improved documentation and fixes/features.
","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,517338735