html_url,issue_url,id,node_id,user,created_at,updated_at,author_association,body,reactions,performed_via_github_app,issue
https://github.com/pydata/xarray/pull/5680#issuecomment-895508489,https://api.github.com/repos/pydata/xarray/issues/5680,895508489,IC_kwDOAMm_X841YGAJ,1217238,2021-08-09T20:11:24Z,2021-08-09T20:11:24Z,MEMBER,"To follow up, from a _practical_ perspective, there are two problems with assuming that there are always ""truly missing values"" (case 2):
1. It makes it impossible to represent the full range of values in a data type, e.g., 255 for uint8 now means ""missing"".
2. Due to unfortunately limited options for representing missing data in NumPy, Xarray represents truly missing values in its data model with ""NaN"". This is more or less OK for floating point data, but means that integer data gets converted into floats. For example, uint8 would now get automatically converted into float32.
Both of these issues are problematic for faithful ""round tripping"" of Xarray data into netCDF and back. For this reason, Xarray needs an unambiguous way to know if a netCDF variable could contain semantically missing values. So far, we've used the presence of `missing_value` and `_FillValue` attributes for that.","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,963006707
https://github.com/pydata/xarray/pull/5680#issuecomment-895455163,https://api.github.com/repos/pydata/xarray/issues/5680,895455163,IC_kwDOAMm_X841X4-7,1217238,2021-08-09T18:46:59Z,2021-08-09T18:46:59Z,MEMBER,"Right, so netCDF3 has a default value used for filling out variables before any data is written.
My concern is that there are two (overlapping) use-case for fill values:
1. The default array value used for variables on disk, e.g., before they are written
2. Truly missing values (with different semantics), which Xarray represents with NaN
Certainly these _sometimes_ coincide, but that isn't necessarily the case.","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,963006707
https://github.com/pydata/xarray/pull/5680#issuecomment-895371331,https://api.github.com/repos/pydata/xarray/issues/5680,895371331,IC_kwDOAMm_X841XkhD,5821660,2021-08-09T16:38:13Z,2021-08-09T16:38:13Z,MEMBER,"AFAIK, these values are chosen, because their binary presentation is good for compression.
For instance the 32bit float 9.969209968386869e+36 is hex 0x7CF00000.
Unfortunately I can't find a link describing that. ","{""total_count"": 1, ""+1"": 1, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,963006707
https://github.com/pydata/xarray/pull/5680#issuecomment-895336362,https://api.github.com/repos/pydata/xarray/issues/5680,895336362,IC_kwDOAMm_X841Xb-q,2448579,2021-08-09T15:50:46Z,2021-08-09T15:55:34Z,MEMBER,"It's in the standard (partly?): https://www.unidata.ucar.edu/software/netcdf/documentation/4.7.4-pre/file_format_specifications.html#atts_spec
```
// Default fill values for each type, may be
// overridden by variable attribute named
// '_FillValue'. See ""Note on fill values"",
// below.
FILL_CHAR = \x00 // null byte
FILL_BYTE = \x81 // (signed char) -127
FILL_SHORT = \x80 \x01 // (short) -32767
FILL_INT = \x80 \x00 \x00 \x01 // (int) -2147483647
FILL_FLOAT = \x7C \xF0 \x00 \x00 // (float) 9.9692099683868690e+36
FILL_DOUBLE = \x47 \x9E \x00 \x00 \x00 \x00 \x00 \x00 //(double)9.9692099683868690e+36
```
and
> Note on fill values: Because data variables may be created before their values are written, and because values need not be written sequentially in a netCDF file, default “fill values” are defined for each type, for initializing data values before they are explicitly written. This makes it possible to detect reading values that were never written. The variable attribute “_FillValue”, if present, overrides the default fill value for a variable. If _FillValue is defined then it should be scalar and of the same type as the variable.
> Fill values are not required, however, because netCDF libraries have traditionally supported a “no fill” mode when writing, omitting the initialization of variable values with fill values. This makes the creation of large files faster, but also eliminates the possibility of detecting the inadvertent reading of values that haven't been written
EDIT: I remember reading some text about how the default _FillValues are ""close to the largest or smallest number representable by a datatype"", but I cannot find it now.","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,963006707
https://github.com/pydata/xarray/pull/5680#issuecomment-895022291,https://api.github.com/repos/pydata/xarray/issues/5680,895022291,IC_kwDOAMm_X841WPTT,1217238,2021-08-09T07:54:22Z,2021-08-09T07:54:22Z,MEMBER,"Could you clarify where these default fill values come from?
Are they just an arbitrary choice by netCDF4-Python? Or are they part of some broader standard?","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,963006707