home / github

Menu
  • Search all tables
  • GraphQL API

issue_comments

Table actions
  • GraphQL API for issue_comments

5 rows where author_association = "MEMBER" and issue = 528060435 sorted by updated_at descending

✎ View and edit SQL

This data as json, CSV (advanced)

Suggested facets: created_at (date), updated_at (date)

user 3

  • mathause 3
  • dcherian 1
  • keewis 1

issue 1

  • fillna on dataset converts all variables to float · 5 ✖

author_association 1

  • MEMBER · 5 ✖
id html_url issue_url node_id user created_at updated_at ▲ author_association body reactions performed_via_github_app issue
692797144 https://github.com/pydata/xarray/issues/3570#issuecomment-692797144 https://api.github.com/repos/pydata/xarray/issues/3570 MDEyOklzc3VlQ29tbWVudDY5Mjc5NzE0NA== dcherian 2448579 2020-09-15T15:35:03Z 2020-09-15T15:35:03Z MEMBER

I guess we could check if the dtype allows missing values (using something like dtype.kind in "cfO")

This seems sensible; fillna only makes sense for dtypes that allow NA.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  fillna on dataset converts all variables to float 528060435
692664177 https://github.com/pydata/xarray/issues/3570#issuecomment-692664177 https://api.github.com/repos/pydata/xarray/issues/3570 MDEyOklzc3VlQ29tbWVudDY5MjY2NDE3Nw== mathause 10194086 2020-09-15T11:50:11Z 2020-09-15T11:50:11Z MEMBER

There are 3 types of failures: - non-lazy evaluation - missing alignment (in merge) - not raising DimensionalityError

So its not trivial...

```Python traceback FAILED xarray/tests/test_dask.py::test_lazy_array_equiv_merge[no_conflicts] - RuntimeError: Too many computes. Total: 1 > max: 0. FAILED xarray/tests/test_dataarray.py::TestReduce2D::test_idxmin[True-x2-minindex2-maxindex2-nanindex2] - RuntimeError: Too many computes. Total: 2 > max: 1. FAILED xarray/tests/test_dataarray.py::TestReduce2D::test_idxmax[True-x2-minindex2-maxindex2-nanindex2] - RuntimeError: Too many computes. Total: 2 > max: 1. FAILED xarray/tests/test_dataset.py::TestDataset::test_dask_is_lazy - xarray.tests.UnexpectedDataAccess: Tried accessing data FAILED xarray/tests/test_merge.py::TestMergeMethod::test_merge_broadcast_equals - ValueError: applied function returned data with unexpected number of dimensions. Received 0 dimension(s) but expected 1 dim... FAILED xarray/tests/test_units.py::TestDataArray::test_fillna[int64-python_scalar-no_unit] - Failed: DID NOT RAISE <class 'pint.errors.DimensionalityError'> ... more of those ```

Potentially this could also be fixed in Dataset.fillna, however, the fill value would need to be dtype dependent, so also not trivial...

https://github.com/pydata/xarray/blob/66ab0ae4f3aa3c461357a5a895405e81357796b1/xarray/core/dataset.py#L4000

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  fillna on dataset converts all variables to float 528060435
692652030 https://github.com/pydata/xarray/issues/3570#issuecomment-692652030 https://api.github.com/repos/pydata/xarray/issues/3570 MDEyOklzc3VlQ29tbWVudDY5MjY1MjAzMA== keewis 14808389 2020-09-15T11:22:08Z 2020-09-15T11:22:08Z MEMBER

yes, we would need to compute if we wanted to use any / all to detect missing values. I guess we could check if the dtype allows missing values (using something like dtype.kind in "cfO"), and only replace for those that do.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  fillna on dataset converts all variables to float 528060435
692574610 https://github.com/pydata/xarray/issues/3570#issuecomment-692574610 https://api.github.com/repos/pydata/xarray/issues/3570 MDEyOklzc3VlQ29tbWVudDY5MjU3NDYxMA== mathause 10194086 2020-09-15T08:59:49Z 2020-09-15T09:58:49Z MEMBER

The problem is that xarray calls where(isnull(data), data, other) for all variables. It also uses dataset_join="left", thus other is NaN for all DataArrays that are not passed... I think fillna should be made a no-op if notnull(data).any()

https://github.com/pydata/xarray/blob/66ab0ae4f3aa3c461357a5a895405e81357796b1/xarray/core/duck_array_ops.py#L284-L288

A possible workaround is to use da.fillna(value={"A": 0, "C": True})

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  fillna on dataset converts all variables to float 528060435
692609976 https://github.com/pydata/xarray/issues/3570#issuecomment-692609976 https://api.github.com/repos/pydata/xarray/issues/3570 MDEyOklzc3VlQ29tbWVudDY5MjYwOTk3Ng== mathause 10194086 2020-09-15T09:57:47Z 2020-09-15T09:57:47Z MEMBER

The straightforward idea unfortunately leads to some test failures. I.e. the following does not work:

```python def fillna(data, other): # we need to pass data first so pint has a chance of returning the # correct unit # TODO: revert after https://github.com/hgrecco/pint/issues/1019 is fixed mask = isnull(data)

if mask.any():
    return where(~mask, data, other)
else:
    return data

```

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  fillna on dataset converts all variables to float 528060435

Advanced export

JSON shape: default, array, newline-delimited, object

CSV options:

CREATE TABLE [issue_comments] (
   [html_url] TEXT,
   [issue_url] TEXT,
   [id] INTEGER PRIMARY KEY,
   [node_id] TEXT,
   [user] INTEGER REFERENCES [users]([id]),
   [created_at] TEXT,
   [updated_at] TEXT,
   [author_association] TEXT,
   [body] TEXT,
   [reactions] TEXT,
   [performed_via_github_app] TEXT,
   [issue] INTEGER REFERENCES [issues]([id])
);
CREATE INDEX [idx_issue_comments_issue]
    ON [issue_comments] ([issue]);
CREATE INDEX [idx_issue_comments_user]
    ON [issue_comments] ([user]);
Powered by Datasette · Queries took 14.133ms · About: xarray-datasette