home / github / issue_comments

Menu
  • Search all tables
  • GraphQL API

issue_comments: 1498647087

This data as json

html_url issue_url id node_id user created_at updated_at author_association body reactions performed_via_github_app issue
https://github.com/pydata/xarray/issues/7723#issuecomment-1498647087 https://api.github.com/repos/pydata/xarray/issues/7723 1498647087 IC_kwDOAMm_X85ZU4ov 5821660 2023-04-06T08:00:09Z 2023-04-06T08:00:09Z MEMBER

I'm still convinced this could be fixed for floating point data.

Generally its worse if we obey some default fill values but not others, because it becomes quite confusing to a user.

I think this depends from which side you look at it :-) My point here is, we do not have to submissively obey to default fill values, but just use them when decoding. This only need to happen if no _FillValue is attached to the variable. By doing this we ensure that these missing values are mapped to np.nan (as it is expected by users).

In further course we can just apply the xarray standard np.nan when writing out. We need to document that in that case exact roundtrip isn't possible (it also isn't currently possible, in this example).

Consider this example:

```python dtype = "f4" with nc.Dataset("test-fillvalues-01.nc", mode="w") as ds: x = ds.createDimension("x", 10) test_fillval_fillon = ds.createVariable("test_fillval_fillon", dtype, ("x",), fill_value=nc.default_fillvals[dtype]) test_fillval_fillon[:5] = np.array([0.0, nc.default_fillvals[dtype], np.nan, 1.0, 8.0], dtype=dtype) test_nofillval_fillon = ds.createVariable("test_nofillval_fillon", dtype, ("x",), fill_value=None) test_nofillval_fillon[:5] = np.array([0.0, nc.default_fillvals[dtype], np.nan, 1.0, 8.0], dtype=dtype)

with nc.Dataset("test-fillvalues-01.nc") as ds: print("\n read with netCDF4-python") print("---------------------------") print(ds["test_fillval_fillon"]) print(ds["test_fillval_fillon"][:]) print(ds["test_nofillval_fillon"]) print(ds["test_nofillval_fillon"][:])

with xr.open_dataset("test-fillvalues-01.nc").load() as ds: print("\n read with xarray") print("---------------------------") print(ds["test_fillval_fillon"]) print(ds["test_fillval_fillon"][:]) print(ds["test_nofillval_fillon"]) print(ds["test_nofillval_fillon"][:]) python read with netCDF4-python


<class 'netCDF4._netCDF4.Variable'> float32 test_fillval_fillon(x) _FillValue: 9.96921e+36 unlimited dimensions: current shape = (10,) filling on [0.0 -- nan 1.0 8.0 -- -- -- -- --] <class 'netCDF4._netCDF4.Variable'> float32 test_nofillval_fillon(x) unlimited dimensions: current shape = (10,) filling on, default _FillValue of 9.969209968386869e+36 used [0.0 -- nan 1.0 8.0 -- -- -- -- --]

read with xarray-python

<xarray.DataArray 'test_fillval_fillon' (x: 10)> array([ 0., nan, nan, 1., 8., nan, nan, nan, nan, nan], dtype=float32) Dimensions without coordinates: x <xarray.DataArray 'test_nofillval_fillon' (x: 10)> array([0.00000e+00, 9.96921e+36, nan, 1.00000e+00, 8.00000e+00, 9.96921e+36, 9.96921e+36, 9.96921e+36, 9.96921e+36, 9.96921e+36], dtype=float32) Dimensions without coordinates: x ```

The only difference between these two variables is that on the first the _FillValue is declared, and on the other the default _FillValue is used. So if xarray obeys (by CF standard) the first it should also obey the second.

This might just work, if these cases the default fillvalue is used for decoding to np.nan, and declared that np.nan will be the new _FillValue. Does that make sense?

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  1655569401
Powered by Datasette · Queries took 0.82ms · About: xarray-datasette