home / github

Menu
  • GraphQL API
  • Search all tables

issue_comments

Table actions
  • GraphQL API for issue_comments

3 rows where author_association = "MEMBER" and issue = 503711327 sorted by updated_at descending

✎ View and edit SQL

This data as json, CSV (advanced)

Suggested facets: created_at (date), updated_at (date)

user 2

  • shoyer 2
  • dcherian 1

issue 1

  • concat() fails when args have sparse.COO data and different fill values · 3 ✖

author_association 1

  • MEMBER · 3 ✖
id html_url issue_url node_id user created_at updated_at ▲ author_association body reactions performed_via_github_app issue
541179896 https://github.com/pydata/xarray/issues/3381#issuecomment-541179896 https://api.github.com/repos/pydata/xarray/issues/3381 MDEyOklzc3VlQ29tbWVudDU0MTE3OTg5Ng== shoyer 1217238 2019-10-11T18:44:02Z 2019-10-11T18:44:02Z MEMBER

OK, thanks for clarify with that example.

I think we can track down the issue to the result of b.sum(dim='foo'): ```

b.data <COO: shape=(3, 6), dtype=float64, nnz=18, fill_value=nan> b.sum(dim='foo').data <COO: shape=(6,), dtype=float64, nnz=6, fill_value=0.0> ```

The fill value here is actually arbitrary, since the array is entirely dense. If this fill value were still nan, the later operation combining these arrays would work.

That said, sparse is making a reasonable choice here: nansum() applied to an array with all values given by nan is 0. Unless sparse wants to add special logic for handling arrays with different sparsities, I don't know how they could change this.

Options for dealing with this: - Use .mean() instead of .sum(). - Explicitly convert c into a dense array before combining it, e.g., d = xr.concat([b, c.copy(data=c.data.todense())], dim='foo') (syntax could be better). But this currently errors with: ValueError: All arrays must be instances of SparseArray. from sparse. Maybe sparse's concatenate could be updated to handle the mixed ndarray/sparse case? - Add some ergonomic way to explicitly override fill_value on sparse data in xarray, e.g., b.sum(dim='foo').with_fill_value(np.nan). - In principle, b.sum(dim='foo', min_count=1) could return a sparse array with fill_value=nan, but currently it doesn't.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  concat() fails when args have sparse.COO data and different fill values 503711327
541138033 https://github.com/pydata/xarray/issues/3381#issuecomment-541138033 https://api.github.com/repos/pydata/xarray/issues/3381 MDEyOklzc3VlQ29tbWVudDU0MTEzODAzMw== shoyer 1217238 2019-10-11T16:42:11Z 2019-10-11T16:42:11Z MEMBER

Sparse only lets you combine arrays with different fill values if the result would also have a fixed value. That's why you can multiply or add but not concatenate, e.g., assert x.fill_value == 1 and y.fill_value == 2 assert (x + y).fill_value == 3 assert (x * y).fill_value == 2 np.stack([x, y]) # error, would need a mixture of different fill values to represent

Multiple fill values simply aren't representable by sparse's data model.

I think you could work this by wrapping the sparse arrays in dask first.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  concat() fails when args have sparse.COO data and different fill values 503711327
539551326 https://github.com/pydata/xarray/issues/3381#issuecomment-539551326 https://api.github.com/repos/pydata/xarray/issues/3381 MDEyOklzc3VlQ29tbWVudDUzOTU1MTMyNg== dcherian 2448579 2019-10-08T14:52:08Z 2019-10-08T14:52:08Z MEMBER

Thanks @khaeru.

  1. This looks like a sparse error: https://sparse.pydata.org/en/latest/generated/sparse.concatenate.html: ValueError – If all elements of arrays don’t have the same fill-value.

  2. This too is a sparse error: Cannot provide a fill-value in combination with something that already has a fill-value So you'll need to figure out how to change fill values on sparse arrays.

  3. This needs some investigation if you're up for it. ``` # But simple operations again create objects with potentially incompatible

fill-values

d = c.sum(dim='bar') print(d.data.fill_value) # 0.0 ```

I also see that we aren't testing for fill_value changes in test_sparse.py so it would be good to add some of those even if they fail currently so that someone else (like you!) can come in and fix it.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  concat() fails when args have sparse.COO data and different fill values 503711327

Advanced export

JSON shape: default, array, newline-delimited, object

CSV options:

CREATE TABLE [issue_comments] (
   [html_url] TEXT,
   [issue_url] TEXT,
   [id] INTEGER PRIMARY KEY,
   [node_id] TEXT,
   [user] INTEGER REFERENCES [users]([id]),
   [created_at] TEXT,
   [updated_at] TEXT,
   [author_association] TEXT,
   [body] TEXT,
   [reactions] TEXT,
   [performed_via_github_app] TEXT,
   [issue] INTEGER REFERENCES [issues]([id])
);
CREATE INDEX [idx_issue_comments_issue]
    ON [issue_comments] ([issue]);
CREATE INDEX [idx_issue_comments_user]
    ON [issue_comments] ([user]);
Powered by Datasette · Queries took 1363.475ms · About: xarray-datasette