home / github / issues

Menu
  • Search all tables
  • GraphQL API

issues: 979916914

This data as json

id node_id number title user state locked assignee milestone comments created_at updated_at closed_at author_association active_lock_reason draft pull_request body reactions performed_via_github_app state_reason repo type
979916914 MDU6SXNzdWU5Nzk5MTY5MTQ= 5739 Writing and reopening introduces bad values 54010293 closed 0     2 2021-08-26T07:16:00Z 2024-03-25T14:54:20Z 2024-03-25T14:54:20Z NONE      

What happened: When I open two particular netcdf files in xarray, concat them and write the resulting Dataset to disk and reopen, bad/unexpected values are introduced. This happens very rarely (at least that I have noticed) and I have not spotted a pattern that would indicate when to expect this behaviour.

What you expected to happen: The values should not change through the writing and reading process.

Minimal Complete Verifiable Example:

Download 2t_era5_moda_sfc_20190201-20190228.nc and 2t_era5_moda_sfc_20190501-20190531.nc from https://github.com/dougrichardson/issues/tree/main/xarray_write (each file is ~1.3MB).

```python feb = xr.open_dataset('./2t_era5_moda_sfc_20190201-20190228.nc') feb = feb.sel(latitude=slice(21,19), longitude=slice(79,80))

may = xr.open_dataset('./2t_era5_moda_sfc_20190501-20190531.nc') may = may.sel(latitude=slice(21,19), longitude=slice(79,80))

ds = xr.concat([feb, may], dim='time')

The bad values are introduced for may. This is what the file should look like:

ds.t2m.sel(time='2019-05-01').plot()

Write to file, reopen and plot again

ds.to_netcdf('./test.nc', mode='w') ds.close()

ds2 = xr.open_dataset('./test.nc') ds2.t2m.sel(time='2019-05-01').plot() ds2.close()

We can also compare values using numpy.isclose:

np.isclose(may.t2m.values, ds2.t2m.sel(time='2019-05').values) array([[[False, False, False, False, False], [False, False, False, False, False], [False, False, False, False, True], [False, False, False, False, False], [False, False, False, False, False], [False, False, False, False, False], [False, False, False, False, False], [ True, True, False, False, False], [False, True, True, True, False]]]) ```

Anything else we need to know?: Bad data is generated only in one time slice of ds, i.e. ds.sel(time='2019-05-01'). However, I have replaced feb with a number of different netcdf files, and there is no problem. Thus the issue seems to be with these two files specifically. I can provide a third netcdf file to highlight the lack of a problem there, if that would be useful.

This appears to be related to the encoding - if I specify the datatype when writing to file, the problem is fixed. However, as pointed out in https://github.com/pydata/xarray/issues/4826, this can introduce other problems. The netcdf files are climate data with add_offset and scale_factor attributes.

Environment:

Output of <tt>xr.show_versions()</tt> INSTALLED VERSIONS ------------------ commit: None python: 3.9.4 | packaged by conda-forge | (default, May 10 2021, 22:13:33) [GCC 9.3.0] python-bits: 64 OS: Linux OS-release: 4.18.0-305.7.1.el8.nci.x86_64 machine: x86_64 processor: x86_64 byteorder: little LC_ALL: None LANG: None LOCALE: ('en_US', 'UTF-8') libhdf5: 1.10.6 libnetcdf: 4.7.4 xarray: 0.19.0 pandas: 1.2.4 numpy: 1.20.3 scipy: 1.6.3 netCDF4: 1.5.6 pydap: None h5netcdf: None h5py: None Nio: None zarr: 2.8.3 cftime: 1.5.0 nc_time_axis: 1.2.0 PseudoNetCDF: None rasterio: 1.2.4 cfgrib: None iris: None bottleneck: 1.3.2 dask: 2021.05.1 distributed: 2021.05.1 matplotlib: 3.4.2 cartopy: 0.19.0.post1 seaborn: None numbagg: None pint: None setuptools: 49.6.0.post20210108 pip: 21.1.2 conda: 4.10.1 pytest: None IPython: 7.24.0 sphinx: None
{
    "url": "https://api.github.com/repos/pydata/xarray/issues/5739/reactions",
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  completed 13221727 issue

Links from other tables

  • 2 rows from issues_id in issues_labels
  • 1 row from issue in issue_comments
Powered by Datasette · Queries took 0.952ms · About: xarray-datasette