home / github

Menu
  • Search all tables
  • GraphQL API

issue_comments

Table actions
  • GraphQL API for issue_comments

2 rows where issue = 942738904 and user = 1217238 sorted by updated_at descending

✎ View and edit SQL

This data as json, CSV (advanced)

Suggested facets: created_at (date)

These facets timed out: author_association, issue

user 1

  • shoyer · 2 ✖
id html_url issue_url node_id user created_at updated_at ▲ author_association body reactions performed_via_github_app issue
879561954 https://github.com/pydata/xarray/issues/5597#issuecomment-879561954 https://api.github.com/repos/pydata/xarray/issues/5597 MDEyOklzc3VlQ29tbWVudDg3OTU2MTk1NA== shoyer 1217238 2021-07-14T03:43:37Z 2021-07-14T03:44:00Z MEMBER

Thanks for sharing the subset netCDF file, that is very helpful for debugging indeed!

The weird thing is that the dtype picking logic seems to have a special case that, per the code comment, suggesting we want to be using float64 here: https://github.com/pydata/xarray/blob/eea76733770be03e78a0834803291659136bca31/xarray/coding/variables.py#L231-L238

But in fact, the dtype picking logic doesn't do that, because the dtype is already converted into float32, first. The culprit seems to be this line in CFMaskCoder, which promotes the dtype to float32 to fit a fill-value of NaN: https://github.com/pydata/xarray/blob/eea76733770be03e78a0834803291659136bca31/xarray/coding/variables.py#L202

To fix this, I think logic in _choose_float_dtype should be updated to look at encoding['dtype'] (if available) instead of dtype, in order to understand how the data was originally stored.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Decoding netCDF is giving incorrect values for a large file 942738904
879361320 https://github.com/pydata/xarray/issues/5597#issuecomment-879361320 https://api.github.com/repos/pydata/xarray/issues/5597 MDEyOklzc3VlQ29tbWVudDg3OTM2MTMyMA== shoyer 1217238 2021-07-13T19:58:39Z 2021-07-13T19:58:39Z MEMBER

This may just be the expected floating point error from using float32: ``` In [5]: import numpy as np

In [6]: -32766 * np.float32(625.6492454183389) + np.float32(20500023.17537729) Out[6]: 1.2984619140625 ```

If you use full float64, then the data does decode to 0.0: In [7]: -32766 * np.float64(625.6492454183389) + np.float64(20500023.17537729) Out[7]: 0.0

So the question then is why this ends up being decoded using float32 instead of float64, and if that logic should be adjusted or made customizable: https://github.com/pydata/xarray/blob/eea76733770be03e78a0834803291659136bca31/xarray/coding/variables.py#L225

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Decoding netCDF is giving incorrect values for a large file 942738904

Advanced export

JSON shape: default, array, newline-delimited, object

CSV options:

CREATE TABLE [issue_comments] (
   [html_url] TEXT,
   [issue_url] TEXT,
   [id] INTEGER PRIMARY KEY,
   [node_id] TEXT,
   [user] INTEGER REFERENCES [users]([id]),
   [created_at] TEXT,
   [updated_at] TEXT,
   [author_association] TEXT,
   [body] TEXT,
   [reactions] TEXT,
   [performed_via_github_app] TEXT,
   [issue] INTEGER REFERENCES [issues]([id])
);
CREATE INDEX [idx_issue_comments_issue]
    ON [issue_comments] ([issue]);
CREATE INDEX [idx_issue_comments_user]
    ON [issue_comments] ([user]);
Powered by Datasette · Queries took 3274.305ms · About: xarray-datasette