home / github

Menu
  • GraphQL API
  • Search all tables

issue_comments

Table actions
  • GraphQL API for issue_comments

8 rows where issue = 963006707 sorted by updated_at descending

✎ View and edit SQL

This data as json, CSV (advanced)

Suggested facets: reactions, created_at (date), updated_at (date)

user 6

  • shoyer 3
  • dcherian 1
  • kmuehlbauer 1
  • pep8speaks 1
  • github-actions[bot] 1
  • gcaria 1

author_association 3

  • MEMBER 5
  • CONTRIBUTOR 2
  • NONE 1

issue 1

  • ENH: Add default fill values for decode_cf · 8 ✖
id html_url issue_url node_id user created_at updated_at ▲ author_association body reactions performed_via_github_app issue
895508489 https://github.com/pydata/xarray/pull/5680#issuecomment-895508489 https://api.github.com/repos/pydata/xarray/issues/5680 IC_kwDOAMm_X841YGAJ shoyer 1217238 2021-08-09T20:11:24Z 2021-08-09T20:11:24Z MEMBER

To follow up, from a practical perspective, there are two problems with assuming that there are always "truly missing values" (case 2):

  1. It makes it impossible to represent the full range of values in a data type, e.g., 255 for uint8 now means "missing".
  2. Due to unfortunately limited options for representing missing data in NumPy, Xarray represents truly missing values in its data model with "NaN". This is more or less OK for floating point data, but means that integer data gets converted into floats. For example, uint8 would now get automatically converted into float32.

Both of these issues are problematic for faithful "round tripping" of Xarray data into netCDF and back. For this reason, Xarray needs an unambiguous way to know if a netCDF variable could contain semantically missing values. So far, we've used the presence of missing_value and _FillValue attributes for that.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  ENH: Add default fill values for decode_cf 963006707
895455163 https://github.com/pydata/xarray/pull/5680#issuecomment-895455163 https://api.github.com/repos/pydata/xarray/issues/5680 IC_kwDOAMm_X841X4-7 shoyer 1217238 2021-08-09T18:46:59Z 2021-08-09T18:46:59Z MEMBER

Right, so netCDF3 has a default value used for filling out variables before any data is written.

My concern is that there are two (overlapping) use-case for fill values:

  1. The default array value used for variables on disk, e.g., before they are written
  2. Truly missing values (with different semantics), which Xarray represents with NaN

Certainly these sometimes coincide, but that isn't necessarily the case.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  ENH: Add default fill values for decode_cf 963006707
895371331 https://github.com/pydata/xarray/pull/5680#issuecomment-895371331 https://api.github.com/repos/pydata/xarray/issues/5680 IC_kwDOAMm_X841XkhD kmuehlbauer 5821660 2021-08-09T16:38:13Z 2021-08-09T16:38:13Z MEMBER

AFAIK, these values are chosen, because their binary presentation is good for compression.

For instance the 32bit float 9.969209968386869e+36 is hex 0x7CF00000.

Unfortunately I can't find a link describing that.

{
    "total_count": 1,
    "+1": 1,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  ENH: Add default fill values for decode_cf 963006707
895336362 https://github.com/pydata/xarray/pull/5680#issuecomment-895336362 https://api.github.com/repos/pydata/xarray/issues/5680 IC_kwDOAMm_X841Xb-q dcherian 2448579 2021-08-09T15:50:46Z 2021-08-09T15:55:34Z MEMBER

It's in the standard (partly?): https://www.unidata.ucar.edu/software/netcdf/documentation/4.7.4-pre/file_format_specifications.html#atts_spec // Default fill values for each type, may be // overridden by variable attribute named // '_FillValue'. See "Note on fill values", // below. FILL_CHAR = \x00 // null byte FILL_BYTE = \x81 // (signed char) -127 FILL_SHORT = \x80 \x01 // (short) -32767 FILL_INT = \x80 \x00 \x00 \x01 // (int) -2147483647 FILL_FLOAT = \x7C \xF0 \x00 \x00 // (float) 9.9692099683868690e+36 FILL_DOUBLE = \x47 \x9E \x00 \x00 \x00 \x00 \x00 \x00 //(double)9.9692099683868690e+36

and

Note on fill values: Because data variables may be created before their values are written, and because values need not be written sequentially in a netCDF file, default “fill values” are defined for each type, for initializing data values before they are explicitly written. This makes it possible to detect reading values that were never written. The variable attribute “_FillValue”, if present, overrides the default fill value for a variable. If _FillValue is defined then it should be scalar and of the same type as the variable.

Fill values are not required, however, because netCDF libraries have traditionally supported a “no fill” mode when writing, omitting the initialization of variable values with fill values. This makes the creation of large files faster, but also eliminates the possibility of detecting the inadvertent reading of values that haven't been written

EDIT: I remember reading some text about how the default _FillValues are "close to the largest or smallest number representable by a datatype", but I cannot find it now.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  ENH: Add default fill values for decode_cf 963006707
895022291 https://github.com/pydata/xarray/pull/5680#issuecomment-895022291 https://api.github.com/repos/pydata/xarray/issues/5680 IC_kwDOAMm_X841WPTT shoyer 1217238 2021-08-09T07:54:22Z 2021-08-09T07:54:22Z MEMBER

Could you clarify where these default fill values come from?

Are they just an arbitrary choice by netCDF4-Python? Or are they part of some broader standard?

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  ENH: Add default fill values for decode_cf 963006707
894497814 https://github.com/pydata/xarray/pull/5680#issuecomment-894497814 https://api.github.com/repos/pydata/xarray/issues/5680 IC_kwDOAMm_X841UPQW github-actions[bot] 41898282 2021-08-06T20:18:42Z 2021-08-08T16:27:48Z CONTRIBUTOR

Unit Test Results

6 files           6 suites   56m 11s :stopwatch: 16 201 tests 14 433 :heavy_check_mark: 1 735 :zzz:   33 :x: 90 402 runs  82 037 :heavy_check_mark: 8 175 :zzz: 190 :x:

For more details on these failures, see this check.

Results for commit 26f1b322.

:recycle: This comment has been updated with latest results.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  ENH: Add default fill values for decode_cf 963006707
894819235 https://github.com/pydata/xarray/pull/5680#issuecomment-894819235 https://api.github.com/repos/pydata/xarray/issues/5680 IC_kwDOAMm_X841Vduj gcaria 44147817 2021-08-08T16:07:54Z 2021-08-08T16:07:54Z CONTRIBUTOR

I would have liked to add the dictionary in xarray.conventions but this causes circular imports.

It seems that a lot of test failures are caused by something like: ```python

xarray.core.dtypes.maybe_promote(dtype('int64'))

(dtype('float64'), nan ````

So the answer to this must really be yes, am I right?

From the issue's conversation, it wasn't clear to me whether an argument should control the use of the default fill value. Since some tests fail now I guess the answer is yes.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  ENH: Add default fill values for decode_cf 963006707
894818364 https://github.com/pydata/xarray/pull/5680#issuecomment-894818364 https://api.github.com/repos/pydata/xarray/issues/5680 IC_kwDOAMm_X841Vdg8 pep8speaks 24736507 2021-08-08T16:02:37Z 2021-08-08T16:03:49Z NONE

Hello @gcaria! Thanks for updating this PR. We checked the lines you've touched for PEP 8 issues, and found:

There are currently no PEP 8 issues detected in this Pull Request. Cheers! :beers:

Comment last updated at 2021-08-08 16:03:49 UTC
{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  ENH: Add default fill values for decode_cf 963006707

Advanced export

JSON shape: default, array, newline-delimited, object

CSV options:

CREATE TABLE [issue_comments] (
   [html_url] TEXT,
   [issue_url] TEXT,
   [id] INTEGER PRIMARY KEY,
   [node_id] TEXT,
   [user] INTEGER REFERENCES [users]([id]),
   [created_at] TEXT,
   [updated_at] TEXT,
   [author_association] TEXT,
   [body] TEXT,
   [reactions] TEXT,
   [performed_via_github_app] TEXT,
   [issue] INTEGER REFERENCES [issues]([id])
);
CREATE INDEX [idx_issue_comments_issue]
    ON [issue_comments] ([issue]);
CREATE INDEX [idx_issue_comments_user]
    ON [issue_comments] ([user]);
Powered by Datasette · Queries took 14.879ms · About: xarray-datasette