home / github / issues

Menu
  • GraphQL API
  • Search all tables

issues: 1655483374

This data as json

id node_id number title user state locked assignee milestone comments created_at updated_at closed_at author_association active_lock_reason draft pull_request body reactions performed_via_github_app state_reason repo type
1655483374 I_kwDOAMm_X85irKvu 7722 Conflicting _FillValue and missing_value on write 5821660 open 0     3 2023-04-05T11:56:46Z 2023-06-21T13:23:16Z   MEMBER      

What happened?

see also #7191

If missing_value and _FillValue is an attribute of a DataArray it can't be written out to file if these two contradict:

python ValueError: Variable 'test' has conflicting _FillValue (nan) and missing_value (1.0). Cannot encode data.

This happens, if missing_value is an attribute of a specific netCDF Dataset of an existing file. On read the missing_value will be masked with np.nan on the data and it will be preserved within encoding. On write, _FillValue will be added as attribute by xarray (if not available, at least for floating point types), too. So far so good.

The error first manifests if you read back this file and try to write it again. There is no warning on the second read, that the two _FillValue and missing_value are differing. Only on the second write.

What did you expect to happen?

The file should be written on the second roundtrip.

There are at least two solutions to this:

  1. Mask missing_value on read and purge missing_value completely in favor of _FillValue.
  2. Do not handle missing_value at all, but let the user take action.

Minimal Complete Verifiable Example

```Python import numpy as np import netCDF4 as nc import xarray as xr

with nc.Dataset("test-no-fillval-01.nc", mode="w") as ds: x = ds.createDimension("x", 4) test = ds.createVariable("test", "f4", ("x",), fill_value=None) test.missing_value = 1. test.valid_min = 2. test.valid_max = 10. test[:] = np.array([0.0, np.nan, 1.0, 8.0], dtype="f4") with nc.Dataset("test-no-fillval-01.nc") as ds: print(ds["test"]) print(ds["test"][:])

with xr.open_dataset("test-no-fillval-01.nc").load() as roundtrip: print(roundtrip) print(roundtrip["test"].attrs) print(roundtrip["test"].encoding) roundtrip.to_netcdf("test-no-fillval-02.nc")

with xr.open_dataset("test-no-fillval-02.nc").load() as roundtrip: print(roundtrip) print(roundtrip["test"].attrs) print(roundtrip["test"].encoding) roundtrip.to_netcdf("test-no-fillval-03.nc") ```

MVCE confirmation

  • [X] Minimal example — the example is as focused as reasonably possible to demonstrate the underlying issue in xarray.
  • [X] Complete example — the example is self-contained, including all data and the text of any traceback.
  • [X] Verifiable example — the example copy & pastes into an IPython prompt or Binder notebook, returning the result.
  • [X] New issue — a search of GitHub Issues suggests this is not a duplicate.

Relevant log output

```Python <class 'netCDF4._netCDF4.Variable'> float32 test(x) missing_value: 1.0 valid_min: 2.0 valid_max: 10.0 unlimited dimensions: current shape = (4,) filling on, default _FillValue of 9.969209968386869e+36 used

<xarray.Dataset> Dimensions: (x: 4) Dimensions without coordinates: x Data variables: test (x) float32 0.0 nan nan 8.0 {'valid_min': 2.0, 'valid_max': 10.0} {'zlib': False, 'szip': False, 'zstd': False, 'bzip2': False, 'blosc': False, 'shuffle': False, 'complevel': 0, 'fletcher32': False, 'contiguous': True, 'chunksizes': None, 'source': 'test-no-fillval-01.nc', 'original_shape': (4,), 'dtype': dtype('float32'), 'missing_value': 1.0}

<xarray.Dataset> Dimensions: (x: 4) Dimensions without coordinates: x Data variables: test (x) float32 0.0 nan nan 8.0 {'valid_min': 2.0, 'valid_max': 10.0} {'zlib': False, 'szip': False, 'zstd': False, 'bzip2': False, 'blosc': False, 'shuffle': False, 'complevel': 0, 'fletcher32': False, 'contiguous': True, 'chunksizes': None, 'source': 'test-no-fillval-02.nc', 'original_shape': (4,), 'dtype': dtype('float32'), 'missing_value': 1.0, '_FillValue': nan}

File /home/kai/miniconda/envs/xarray_311/lib/python3.11/site-packages/xarray/coding/variables.py:167, in CFMaskCoder.encode(self, variable, name) 160 mv = encoding.get("missing_value") 162 if ( 163 fv is not None 164 and mv is not None 165 and not duck_array_ops.allclose_or_equiv(fv, mv) 166 ): --> 167 raise ValueError( 168 f"Variable {name!r} has conflicting _FillValue ({fv}) and missing_value ({mv}). Cannot encode data." 169 ) 171 if fv is not None: 172 # Ensure _FillValue is cast to same dtype as data's 173 encoding["_FillValue"] = dtype.type(fv)

ValueError: Variable 'test' has conflicting _FillValue (nan) and missing_value (1.0). Cannot encode data. ```

Anything else we need to know?

The adding of _FillValue on write happens here:

https://github.com/pydata/xarray/blob/d4db16699f30ad1dc3e6861601247abf4ac96567/xarray/conventions.py#L300

https://github.com/pydata/xarray/blob/d4db16699f30ad1dc3e6861601247abf4ac96567/xarray/conventions.py#L144-L152

Environment

INSTALLED VERSIONS ------------------ commit: None python: 3.11.0 | packaged by conda-forge | (main, Jan 14 2023, 12:27:40) [GCC 11.3.0] python-bits: 64 OS: Linux OS-release: 5.14.21-150400.24.55-default machine: x86_64 processor: x86_64 byteorder: little LC_ALL: None LANG: de_DE.UTF-8 LOCALE: ('de_DE', 'UTF-8') libhdf5: 1.14.0 libnetcdf: 4.9.2 xarray: 2023.3.0 pandas: 1.5.3 numpy: 1.24.2 scipy: 1.10.1 netCDF4: 1.6.3 pydap: None h5netcdf: 1.1.0 h5py: 3.8.0 Nio: None zarr: 2.14.2 cftime: 1.6.2 nc_time_axis: None PseudoNetCDF: None rasterio: None cfgrib: None iris: None bottleneck: None dask: 2023.3.1 distributed: 2023.3.1 matplotlib: None cartopy: None seaborn: None numbagg: None fsspec: 2023.3.0 cupy: 11.6.0 pint: 0.20.1 sparse: None flox: None numpy_groupies: None setuptools: 67.6.0 pip: 23.0.1 conda: None pytest: 7.2.2 mypy: None IPython: 8.11.0 sphinx: None
{
    "url": "https://api.github.com/repos/pydata/xarray/issues/7722/reactions",
    "total_count": 1,
    "+1": 1,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
    13221727 issue

Links from other tables

  • 2 rows from issues_id in issues_labels
  • 2 rows from issue in issue_comments
Powered by Datasette · Queries took 83.371ms · About: xarray-datasette