html_url,issue_url,id,node_id,user,created_at,updated_at,author_association,body,reactions,performed_via_github_app,issue
https://github.com/pydata/xarray/issues/3381#issuecomment-541179896,https://api.github.com/repos/pydata/xarray/issues/3381,541179896,MDEyOklzc3VlQ29tbWVudDU0MTE3OTg5Ng==,1217238,2019-10-11T18:44:02Z,2019-10-11T18:44:02Z,MEMBER,"OK, thanks for clarify with that example.
I think we can track down the issue to the result of `b.sum(dim='foo')`:
```
>>> b.data
>>> b.sum(dim='foo').data
```
The fill value here is actually arbitrary, since the array is entirely dense. If this fill value were still `nan`, the later operation combining these arrays would work.
That said, sparse is making a reasonable choice here: `nansum()` applied to an array with all values given by `nan` is `0`. Unless sparse wants to add special logic for handling arrays with different sparsities, I don't know how they could change this.
Options for dealing with this:
- Use `.mean()` instead of `.sum()`.
- Explicitly convert `c` into a dense array before combining it, e.g., `d = xr.concat([b, c.copy(data=c.data.todense())], dim='foo')` (syntax could be better). But this currently errors with: `ValueError: All arrays must be instances of SparseArray.` from sparse. Maybe sparse's concatenate could be updated to handle the mixed ndarray/sparse case?
- Add some ergonomic way to explicitly override `fill_value` on sparse data in xarray, e.g., `b.sum(dim='foo').with_fill_value(np.nan)`.
- In principle, `b.sum(dim='foo', min_count=1)` could return a sparse array with `fill_value=nan`, but currently it doesn't.","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,503711327
https://github.com/pydata/xarray/issues/3381#issuecomment-541160571,https://api.github.com/repos/pydata/xarray/issues/3381,541160571,MDEyOklzc3VlQ29tbWVudDU0MTE2MDU3MQ==,1634164,2019-10-11T17:49:09Z,2019-10-11T17:49:09Z,NONE,"Thanks both for the comments. I understand sparse's behaviour; to clarify, the bug (IMO) is that xarray doesn't handle this for the user. To condense my example:
```python
# Same as above to ---
import numpy as np
import pandas as pd
import xarray as xr
foo = [f'foo{i}' for i in range(6)]
bar = [f'bar{i}' for i in range(6)]
raw = np.random.rand(len(foo) // 2, len(bar))
b_series = pd.DataFrame(raw, index=foo[3:], columns=bar) \
.stack() \
.rename_axis(index=['foo', 'bar'])
# ---
b = xr.DataArray.from_series(b_series, sparse=True)
c = b.sum(dim='foo').expand_dims({'foo': ['total']})
d = xr.concat([b, c], dim='foo')
```
This succeeds when `sparse=False` and fails when `sparse=True`.
- Shouldn't it succeed automatically? I feel like it should.
- If it does, what should be the fill value on `d`? I'm not clear what the intended behaviour is.
I haven't touched xarray internals before, but if time allows I will try to add some tests.","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,503711327
https://github.com/pydata/xarray/issues/3381#issuecomment-541138033,https://api.github.com/repos/pydata/xarray/issues/3381,541138033,MDEyOklzc3VlQ29tbWVudDU0MTEzODAzMw==,1217238,2019-10-11T16:42:11Z,2019-10-11T16:42:11Z,MEMBER,"Sparse only lets you combine arrays with different fill values if the result would also have a fixed value. That's why you can multiply or add but not concatenate, e.g.,
```
assert x.fill_value == 1 and y.fill_value == 2
assert (x + y).fill_value == 3
assert (x * y).fill_value == 2
np.stack([x, y]) # error, would need a mixture of different fill values to represent
```
Multiple fill values simply aren't representable by sparse's data model.
I think you could work this by wrapping the sparse arrays in dask first.","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,503711327
https://github.com/pydata/xarray/issues/3381#issuecomment-539551326,https://api.github.com/repos/pydata/xarray/issues/3381,539551326,MDEyOklzc3VlQ29tbWVudDUzOTU1MTMyNg==,2448579,2019-10-08T14:52:08Z,2019-10-08T14:52:08Z,MEMBER,"Thanks @khaeru.
1. This looks like a `sparse` error: https://sparse.pydata.org/en/latest/generated/sparse.concatenate.html: `ValueError – If all elements of arrays don’t have the same fill-value.`
2. This too is a `sparse` error:
`Cannot provide a fill-value in combination with something that already has a fill-value`
So you'll need to figure out how to change fill values on `sparse` arrays.
3. This needs some investigation if you're up for it.
``` # But simple operations again create objects with potentially incompatible
# fill-values
d = c.sum(dim='bar')
print(d.data.fill_value) # 0.0
```
I also see that we aren't testing for `fill_value` changes in `test_sparse.py` so it would be good to add some of those even if they fail currently so that someone else (like you!) can come in and fix it.","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,503711327