html_url,issue_url,id,node_id,user,created_at,updated_at,author_association,body,reactions,performed_via_github_app,issue https://github.com/pydata/xarray/issues/3381#issuecomment-541179896,https://api.github.com/repos/pydata/xarray/issues/3381,541179896,MDEyOklzc3VlQ29tbWVudDU0MTE3OTg5Ng==,1217238,2019-10-11T18:44:02Z,2019-10-11T18:44:02Z,MEMBER,"OK, thanks for clarify with that example. I think we can track down the issue to the result of `b.sum(dim='foo')`: ``` >>> b.data >>> b.sum(dim='foo').data ``` The fill value here is actually arbitrary, since the array is entirely dense. If this fill value were still `nan`, the later operation combining these arrays would work. That said, sparse is making a reasonable choice here: `nansum()` applied to an array with all values given by `nan` is `0`. Unless sparse wants to add special logic for handling arrays with different sparsities, I don't know how they could change this. Options for dealing with this: - Use `.mean()` instead of `.sum()`. - Explicitly convert `c` into a dense array before combining it, e.g., `d = xr.concat([b, c.copy(data=c.data.todense())], dim='foo')` (syntax could be better). But this currently errors with: `ValueError: All arrays must be instances of SparseArray.` from sparse. Maybe sparse's concatenate could be updated to handle the mixed ndarray/sparse case? - Add some ergonomic way to explicitly override `fill_value` on sparse data in xarray, e.g., `b.sum(dim='foo').with_fill_value(np.nan)`. - In principle, `b.sum(dim='foo', min_count=1)` could return a sparse array with `fill_value=nan`, but currently it doesn't.","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,503711327 https://github.com/pydata/xarray/issues/3381#issuecomment-541160571,https://api.github.com/repos/pydata/xarray/issues/3381,541160571,MDEyOklzc3VlQ29tbWVudDU0MTE2MDU3MQ==,1634164,2019-10-11T17:49:09Z,2019-10-11T17:49:09Z,NONE,"Thanks both for the comments. I understand sparse's behaviour; to clarify, the bug (IMO) is that xarray doesn't handle this for the user. To condense my example: ```python # Same as above to --- import numpy as np import pandas as pd import xarray as xr foo = [f'foo{i}' for i in range(6)] bar = [f'bar{i}' for i in range(6)] raw = np.random.rand(len(foo) // 2, len(bar)) b_series = pd.DataFrame(raw, index=foo[3:], columns=bar) \ .stack() \ .rename_axis(index=['foo', 'bar']) # --- b = xr.DataArray.from_series(b_series, sparse=True) c = b.sum(dim='foo').expand_dims({'foo': ['total']}) d = xr.concat([b, c], dim='foo') ``` This succeeds when `sparse=False` and fails when `sparse=True`. - Shouldn't it succeed automatically? I feel like it should. - If it does, what should be the fill value on `d`? I'm not clear what the intended behaviour is. I haven't touched xarray internals before, but if time allows I will try to add some tests.","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,503711327 https://github.com/pydata/xarray/issues/3381#issuecomment-541138033,https://api.github.com/repos/pydata/xarray/issues/3381,541138033,MDEyOklzc3VlQ29tbWVudDU0MTEzODAzMw==,1217238,2019-10-11T16:42:11Z,2019-10-11T16:42:11Z,MEMBER,"Sparse only lets you combine arrays with different fill values if the result would also have a fixed value. That's why you can multiply or add but not concatenate, e.g., ``` assert x.fill_value == 1 and y.fill_value == 2 assert (x + y).fill_value == 3 assert (x * y).fill_value == 2 np.stack([x, y]) # error, would need a mixture of different fill values to represent ``` Multiple fill values simply aren't representable by sparse's data model. I think you could work this by wrapping the sparse arrays in dask first.","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,503711327 https://github.com/pydata/xarray/issues/3381#issuecomment-539551326,https://api.github.com/repos/pydata/xarray/issues/3381,539551326,MDEyOklzc3VlQ29tbWVudDUzOTU1MTMyNg==,2448579,2019-10-08T14:52:08Z,2019-10-08T14:52:08Z,MEMBER,"Thanks @khaeru. 1. This looks like a `sparse` error: https://sparse.pydata.org/en/latest/generated/sparse.concatenate.html: `ValueError – If all elements of arrays don’t have the same fill-value.` 2. This too is a `sparse` error: `Cannot provide a fill-value in combination with something that already has a fill-value` So you'll need to figure out how to change fill values on `sparse` arrays. 3. This needs some investigation if you're up for it. ``` # But simple operations again create objects with potentially incompatible # fill-values d = c.sum(dim='bar') print(d.data.fill_value) # 0.0 ``` I also see that we aren't testing for `fill_value` changes in `test_sparse.py` so it would be good to add some of those even if they fail currently so that someone else (like you!) can come in and fix it.","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,503711327