issues: 1655569401
This data as json
id | node_id | number | title | user | state | locked | assignee | milestone | comments | created_at | updated_at | closed_at | author_association | active_lock_reason | draft | pull_request | body | reactions | performed_via_github_app | state_reason | repo | type |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
1655569401 | I_kwDOAMm_X85irfv5 | 7723 | default fill_value not masked when read from file | 5821660 | closed | 0 | 5 | 2023-04-05T12:54:53Z | 2023-09-13T12:44:54Z | 2023-09-13T12:44:54Z | MEMBER | What happened?When reading a netcdf file wich has been created with fill_value=None (default) those data is not masked. If one is writing back to disk this manifests. What did you expect to happen?Values should be masked. There seems to be a simple solution: On read apply the netcdf default fill_value in the variables attributes before decoding if no Minimal Complete Verifiable Example```Python import numpy as np import netCDF4 as nc import xarray as xr with nc.Dataset("test-no-missing-01.nc", mode="w") as ds: x = ds.createDimension("x", 5) test = ds.createVariable("test", "f4", ("x",), fill_value=None) test[:4] = np.array([0.0, np.nan, 1.0, 8.0], dtype="f4") with nc.Dataset("test-no-missing-01.nc") as ds: print(ds["test"]) print(ds["test"][:]) with xr.open_dataset("test-no-missing-01.nc").load() as roundtrip: print(roundtrip) print(roundtrip["test"].attrs) print(roundtrip["test"].encoding) roundtrip.to_netcdf("test-no-missing-02.nc") with nc.Dataset("test-no-missing-02.nc") as ds: print(ds["test"]) print(ds["test"][:]) ``` MVCE confirmation
Relevant log output```Python <class 'netCDF4._netCDF4.Variable'> float32 test(x) unlimited dimensions: current shape = (5,) filling on, default _FillValue of 9.969209968386869e+36 used [0.0 nan 1.0 8.0 --] <xarray.Dataset> Dimensions: (x: 5) Dimensions without coordinates: x Data variables: test (x) float32 0.0 nan 1.0 8.0 9.969e+36 {} {'zlib': False, 'szip': False, 'zstd': False, 'bzip2': False, 'blosc': False, 'shuffle': False, 'complevel': 0, 'fletcher32': False, 'contiguous': True, 'chunksizes': None, 'source': 'test-no-missing-01.nc', 'original_shape': (5,), 'dtype': dtype('float32')} <class 'netCDF4._netCDF4.Variable'> float32 test(x) _FillValue: nan unlimited dimensions: current shape = (5,) filling on [0.0 -- 1.0 8.0 9.969209968386869e+36] ``` Anything else we need to know?The issue is similar to #7722 but is more intricate, as now the status of certain data values change from masked to some netcdf specific default value. This is when only parts of the source dataset have been written to. Then the default fill_value get's delivered to the user but it is not backed by an Environment
INSTALLED VERSIONS
------------------
commit: None
python: 3.11.0 | packaged by conda-forge | (main, Jan 14 2023, 12:27:40) [GCC 11.3.0]
python-bits: 64
OS: Linux
OS-release: 5.14.21-150400.24.55-default
machine: x86_64
processor: x86_64
byteorder: little
LC_ALL: None
LANG: de_DE.UTF-8
LOCALE: ('de_DE', 'UTF-8')
libhdf5: 1.14.0
libnetcdf: 4.9.2
xarray: 2023.3.0
pandas: 1.5.3
numpy: 1.24.2
scipy: 1.10.1
netCDF4: 1.6.3
pydap: None
h5netcdf: 1.1.0
h5py: 3.8.0
Nio: None
zarr: 2.14.2
cftime: 1.6.2
nc_time_axis: None
PseudoNetCDF: None
rasterio: None
cfgrib: None
iris: None
bottleneck: None
dask: 2023.3.1
distributed: 2023.3.1
matplotlib: None
cartopy: None
seaborn: None
numbagg: None
fsspec: 2023.3.0
cupy: 11.6.0
pint: 0.20.1
sparse: None
flox: None
numpy_groupies: None
setuptools: 67.6.0
pip: 23.0.1
conda: None
pytest: 7.2.2
mypy: None
IPython: 8.11.0
sphinx: None
|
{ "url": "https://api.github.com/repos/pydata/xarray/issues/7723/reactions", "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
completed | 13221727 | issue |