id,node_id,number,title,user,state,locked,assignee,milestone,comments,created_at,updated_at,closed_at,author_association,active_lock_reason,draft,pull_request,body,reactions,performed_via_github_app,state_reason,repo,type 667203487,MDU6SXNzdWU2NjcyMDM0ODc=,4282,Values change when writing combined Dataset loaded with open_mfdataset,11723107,closed,0,,,1,2020-07-28T16:20:09Z,2022-04-09T03:00:55Z,2022-04-09T03:00:55Z,NONE,,,," **What happened**: Loading two netcdf files with `open_mfdataset` then writing into a combined file results in some values changed in the file. **What you expected to happen**: That the written file contains the same values than the in-memory `Dataset` when read again. **Minimal Complete Verifiable Example**: ```python >>> import numpy as np >>> import xarray as xr >>> data1 = xr.open_dataset(""file1.nc"") >>> data2 = xr.open_dataset(""file2.nc"") >>> merged = xr.open_mfdataset([""file1.nc"", ""file2.nc""]) >>> np.all(np.isclose(merged[""u""].values[0], data1[""u""].values[0])) True >>> np.all(np.isclose(merged[""u""].values[-1], data2[""u""].values[-1])) True >>> merged.to_netcdf(""foo.nc"") >>> merged_file = xr.load_dataset(""foo.nc"") >>> np.all(np.isclose(merged_file[""u""].values, merged[""u""].values)) False ``` The files contain wind data from the ERA5 reanalysis, downloaded from [CDS](https://cds.climate.copernicus.eu/#!/home). **Anything else we need to know?**: The issue might be related to the scale and offset values of the variable. Continuing the example: ```python >>> np.all(np.isclose(merged_file[""u""].values[0], data1[""u""].values[0])) True >>> np.all(np.isclose(merged_file[""u""].values[-1], data2[""u""].values[-1])) False ``` Data from the first file seems to be correct. When writing the combined dataset, the scale and offset from the first file are written to the combined file: ```python >>> data1_nomas = xr.open_dataset(""file1.nc"", mask_and_scale=False) >>> data2_nomas = xr.open_dataset(""file2.nc"", mask_and_scale=False) >>> merged_file_nomas = xr.open_dataset(""foo.nc"", mask_and_scale=False) >>> data1_nomas[""u""].attrs {'scale_factor': 0.002397265127278432, 'add_offset': 25.620963232670736, '_FillValue': -32767, 'missing_value': -32767, 'units': 'm s**-1', 'long_name': 'U component of wind', 'standard_name': 'eastward_wind'} >>> data2_nomas[""u""].attrs {'scale_factor': 0.0024358825557859445, 'add_offset': 21.288035293585388, '_FillValue': -32767, 'missing_value': -32767, 'units': 'm s**-1', 'long_name': 'U component of wind', 'standard_name': 'eastward_wind'} >>> merged_file_nomas[""u""].attrs {'scale_factor': 0.002397265127278432, 'add_offset': 25.620963232670736, '_FillValue': -32767, 'units': 'm s**-1', 'long_name': 'U component of wind', 'standard_name': 'eastward_wind', 'missing_value': -32767} ``` Maybe the data from the second file is not adjusted to fit the new scaling and offset. **Environment**:
Output of xr.show_versions() INSTALLED VERSIONS ------------------ commit: None python: 3.7.6 | packaged by conda-forge | (default, Jun 1 2020, 18:57:50) [GCC 7.5.0] python-bits: 64 OS: Linux OS-release: 4.15.0-107-generic machine: x86_64 processor: x86_64 byteorder: little LC_ALL: None LANG: en_US.UTF-8 LOCALE: en_US.UTF-8 libhdf5: 1.10.6 libnetcdf: 4.7.4 xarray: 0.16.0 pandas: 1.0.4 numpy: 1.18.5 scipy: 1.4.1 netCDF4: 1.5.3 pydap: None h5netcdf: None h5py: None Nio: None zarr: None cftime: 1.2.1 nc_time_axis: None PseudoNetCDF: None rasterio: None cfgrib: 0.9.8.2 iris: None bottleneck: None dask: 2.18.1 distributed: 2.21.0 matplotlib: 3.2.1 cartopy: 0.18.0 seaborn: None numbagg: None pint: 0.14 setuptools: 49.2.0.post20200712 pip: 20.1.1 conda: 4.8.3 pytest: None IPython: 7.16.1 sphinx: None
","{""url"": ""https://api.github.com/repos/pydata/xarray/issues/4282/reactions"", ""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,completed,13221727,issue