id,node_id,number,title,user,state,locked,assignee,milestone,comments,created_at,updated_at,closed_at,author_association,active_lock_reason,draft,pull_request,body,reactions,performed_via_github_app,state_reason,repo,type 558293655,MDU6SXNzdWU1NTgyOTM2NTU=,3739,ValueError when trying to encode time variable in a NetCDF file with CF convensions,33062222,closed,0,,,7,2020-01-31T18:22:36Z,2023-09-13T13:45:47Z,2023-09-13T13:45:46Z,NONE,,,," ```python # Imports import numpy as np import xarray as xr import pandas as pd from glob import glob # files to be concatenated files = sorted(glob(path + str(1988) + '/V250*')) # corrected dates dates = pd.date_range(start=str(yr), end=str(yr+1), freq='6H', closed='left') ds_test = xr.open_mfdataset(files[:10], combine='nested', concat_dim='time', decode_cf=False) # correcting time ds_test.time.values=dates[:10] # fixing encoding ds_test.time.attrs['units'] = ""Seconds since 1970-01-01 00:00:00"" # preview of the time variable print(ds_test.time) > array(['1988-01-01T00:00:00.000000000', '1988-01-01T06:00:00.000000000', '1988-01-01T12:00:00.000000000', '1988-01-01T18:00:00.000000000', '1988-01-02T00:00:00.000000000', '1988-01-02T06:00:00.000000000', '1988-01-02T12:00:00.000000000', '1988-01-02T18:00:00.000000000', '1988-01-03T00:00:00.000000000', '1988-01-03T06:00:00.000000000'], dtype='datetime64[ns]') Coordinates: * time (time) datetime64[ns] 1988-01-01 ... 1988-01-03T06:00:00 Attributes: calendar: proleptic_gregorian standard_name: time units: Seconds since 1970-01-01 00:00:00 ds_test.to_netcdf(path+'test.nc') >ValueError: failed to prevent overwriting existing key units in attrs on variable 'time'. This is probably an encoding field used by xarray to describe how a variable is serialized. To proceed, remove this key from the variable's attributes manually. ``` #### Expected Output Correctly encode `time` such that it saves the file by correctly converting value of `time` according to the reference units. I have the flexibility of dropping CF-conventions as long as time values are correct but it would also be nice to have a solution which keeps the CF-conventions intact. #### Problem Description I'm trying to concatenate `netcdf` files which have `CF` conventions mentioned in their global attributes. These files have an incorrect time dimension which I try to fix with the code above. It seems that some existing encoding is preventing from writing the files back. But when I print the encoding, it doesn't show any such clashing `units`. I'm not sure if this is a bug or a wrong usage issue. Thus, any usage help on how to correctly encode `time` such that it saves the time values by correctly converting according to the reference units is much appreciated. ```python # More diagnostics on the encoding print(ds_test.encoding) >{'unlimited_dims': {'time'}, 'source': '/file/to/path/V250_19880101_00'} # checking any existing time print(ds_test.time.encoding) >{} # another try on setting time encoding ds_test.time.encoding['units'] = ""Seconds since 1970-01-01 00:00:00"" # writing the file gives the same ValueError as above ds_test.to_netcdf(path+'test.nc') # ncdump output of one of the files >netcdf V250_19880101_06 { dimensions: lon = 720 ; lat = 361 ; lev = 1 ; time = UNLIMITED ; // (1 currently) variables: float lon(lon) ; lon:long_name = ""longitude"" ; lon:units = ""degrees_east"" ; lon:standard_name = ""longitude"" ; lon:axis = ""X"" ; float lat(lat) ; lat:long_name = ""latitude"" ; lat:units = ""degrees_north"" ; lat:standard_name = ""latitude"" ; lat:axis = ""Y"" ; float lev(lev) ; lev:long_name = ""hybrid level at layer midpoints"" ; lev:units = ""level"" ; lev:standard_name = ""hybrid_sigma_pressure"" ; lev:positive = ""down"" ; lev:formula = ""hyam hybm (mlev=hyam+hybm*aps)"" ; lev:formula_terms = ""ap: hyam b: hybm ps: aps"" ; float time(time) ; time:units = ""hours since 1988-01-01 06:00:00"" ; time:calendar = ""proleptic_gregorian"" ; time:standard_name = ""time"" ; float V(time, lev, lat, lon) ; V:long_name = ""unknown (please add with NCO)"" ; V:units = ""unknown (please add with NCO)"" ; V:_FillValue = -999.99f ; // global attributes: :Conventions = ""CF"" ; :constants_file_name = ""P19880101_06"" ; :institution = ""IACETH"" ; :lonmin = -180.f ; :lonmax = 179.5f ; :latmin = -90.f ; :latmax = 90.f ; :levmin = 250.f ; :levmax = 250.f ; :history = ""Fri Sep 6 15:59:17 2019: ncatted -a units,time,o,c,hours since 1988-01-01 06:00:00 -a standard_name,time,o,c,time V250_19880101_06"" ; :NCO = ""4.7.2"" ; data: time = 6 ; } ``` #### Output of ``xr.show_versions()``
INSTALLED VERSIONS ------------------ commit: None python: 3.7.4 (default, Aug 13 2019, 20:35:49) [GCC 7.3.0] python-bits: 64 OS: Linux OS-release: 5.0.0-23-generic machine: x86_64 processor: x86_64 byteorder: little LC_ALL: None LANG: en_US.UTF-8 LOCALE: en_US.UTF-8 libhdf5: 1.10.4 libnetcdf: 4.6.1 xarray: 0.13.0 pandas: 0.25.3 numpy: 1.18.1 scipy: 1.3.2 netCDF4: 1.4.2 pydap: None h5netcdf: None h5py: None Nio: None zarr: None cftime: 1.0.4.2 nc_time_axis: None PseudoNetCDF: None rasterio: None cfgrib: None iris: None bottleneck: None dask: 2.9.2 distributed: 2.9.3 matplotlib: 3.1.0 cartopy: 0.17.0 seaborn: 0.9.0 numbagg: None setuptools: 44.0.0.post20200106 pip: 19.3.1 conda: None pytest: None IPython: 7.11.1 sphinx: None
","{""url"": ""https://api.github.com/repos/pydata/xarray/issues/3739/reactions"", ""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,completed,13221727,issue 408426920,MDU6SXNzdWU0MDg0MjY5MjA=,2758,Dataset.to_netcdf() results in unexpected encoding parameters for 'netCDF4' backend,33062222,closed,0,,,2,2019-02-09T12:25:54Z,2019-02-11T09:59:06Z,2019-02-11T09:59:06Z,NONE,,,,"``` import pandas as pd import xarray as xr from datetime import datetime ds_test2=xr.open_dataset('test_file.nc') ``` #### ncudmp to show how file looks like netcdf test_file { dimensions: lon = 720 ; lev = 1 ; time = 27147 ; variables: float lon(lon) ; lon:_FillValue = NaNf ; float lev(lev) ; lev:_FillValue = NaNf ; lev:long_name = ""hybrid level at layer midpoints"" ; lev:units = ""level"" ; lev:standard_name = ""hybrid_sigma_pressure"" ; lev:positive = ""down"" ; lev:formula = ""hyam hybm (mlev=hyam+hybm*aps)"" ; lev:formula_terms = ""ap: hyam b: hybm ps: aps"" ; int64 time(time) ; time:units = ""hours since 2000-01-01 00:00:00"" ; time:calendar = ""proleptic_gregorian"" ; float V(time, lev, lon) ; V:_FillValue = NaNf ; V:units = ""m/s"" ; } ``` std_time = datetime(1970,1,1) timedata = pd.to_datetime(ds_test2.time.values).to_pydatetime() timedata_updated = [(t - std_time).total_seconds() for t in timedata] ds_test2.time.values= timedata_updated ds_test2.time.attrs['units'] = 'Seconds since 01-01-1970 00:00:00 UTC' ### saving file ds_test2.to_netcdf('/scratch3/mali/data/test/test_V250hov_encoding4_v2.nc', encoding={'V':{'_FillValue': -999.0},'time':{'units': ""seconds since 1970-01-01 00:00:00""}})``` --------------------------------------------------------------------------- ValueError Traceback (most recent call last) in 6 # saving file to netcdf for one combined hov dataset 7 ds_test2.to_netcdf('/scratch3/mali/data/test/test_V250hov_encoding4_v2.nc', ----> 8 encoding={'V':{'_FillValue': -999.0},'time':{'units': ""seconds since 1970-01-01 00:00:00""}}) /usr/local/anaconda3/envs/work_env/lib/python3.6/site-packages/xarray/core/dataset.py in to_netcdf(self, path, mode, format, group, engine, encoding, unlimited_dims, compute) 1220 engine=engine, encoding=encoding, 1221 unlimited_dims=unlimited_dims, -> 1222 compute=compute) 1223 1224 def to_zarr(self, store=None, mode='w-', synchronizer=None, group=None, /usr/local/anaconda3/envs/work_env/lib/python3.6/site-packages/xarray/backends/api.py in to_netcdf(dataset, path_or_file, mode, format, group, engine, encoding, unlimited_dims, compute, multifile) 716 # to be parallelized with dask 717 dump_to_store(dataset, store, writer, encoding=encoding, --> 718 unlimited_dims=unlimited_dims) 719 if autoclose: 720 store.close() /usr/local/anaconda3/envs/work_env/lib/python3.6/site-packages/xarray/backends/api.py in dump_to_store(dataset, store, writer, encoder, encoding, unlimited_dims) 759 760 store.store(variables, attrs, check_encoding, writer, --> 761 unlimited_dims=unlimited_dims) 762 763 /usr/local/anaconda3/envs/work_env/lib/python3.6/site-packages/xarray/backends/common.py in store(self, variables, attributes, check_encoding_set, writer, unlimited_dims) 264 self.set_dimensions(variables, unlimited_dims=unlimited_dims) 265 self.set_variables(variables, check_encoding_set, writer, --> 266 unlimited_dims=unlimited_dims) 267 268 def set_attributes(self, attributes): /usr/local/anaconda3/envs/work_env/lib/python3.6/site-packages/xarray/backends/common.py in set_variables(self, variables, check_encoding_set, writer, unlimited_dims) 302 check = vn in check_encoding_set 303 target, source = self.prepare_variable( --> 304 name, v, check, unlimited_dims=unlimited_dims) 305 306 writer.add(source, target) /usr/local/anaconda3/envs/work_env/lib/python3.6/site-packages/xarray/backends/netCDF4_.py in prepare_variable(self, name, variable, check_encoding, unlimited_dims) 448 encoding = _extract_nc4_variable_encoding( 449 variable, raise_on_invalid=check_encoding, --> 450 unlimited_dims=unlimited_dims) 451 if name in self.ds.variables: 452 nc4_var = self.ds.variables[name] /usr/local/anaconda3/envs/work_env/lib/python3.6/site-packages/xarray/backends/netCDF4_.py in _extract_nc4_variable_encoding(variable, raise_on_invalid, lsd_okay, h5py_okay, backend, unlimited_dims) 223 if invalid: 224 raise ValueError('unexpected encoding parameters for %r backend: ' --> 225 ' %r' % (backend, invalid)) 226 else: 227 for k in list(encoding): ValueError: unexpected encoding parameters for 'netCDF4' backend: ['units'] ``` ### Problem description I'm trying to change the time attributes becaues in the workflow there are some scripts which are not in python and would like the time to start from a specific year. I've written the code to calculate seconds from a specific standard time. Later on, I realised that I don't need to do that as xarray takes care of that when saving the data when specified with that encoding parameter. Strange thing is the writng the file by above approach is giving me an error however, when I just read in the same file and save it with the same encoding as above and without changing the time values manually, it works fine. Here's what I mean: ``` ds_test3 = xr.open_dataset('test_file.nc') ## same file as before ## saving directly without doing any calculations like before ds_test3.to_netcdf('/scratch3/mali/data/test/test_V250hov_encoding4_v2.nc', encoding={'V':{'_FillValue': -999.0},'time':{'units': ""seconds since 1970-01-01 00:00:00""}} ## above code works fine ``` #### Output of ``xr.show_versions()``
INSTALLED VERSIONS ------------------ commit: None python: 3.6.6.final.0 python-bits: 64 OS: Linux OS-release: 4.15.0-45-generic machine: x86_64 processor: x86_64 byteorder: little LC_ALL: None LANG: en_US.UTF-8 LOCALE: en_US.UTF-8 xarray: 0.11.0 pandas: 0.23.4 numpy: 1.15.2 scipy: 1.1.0 netCDF4: 1.4.2 h5netcdf: None h5py: 2.8.0 Nio: None zarr: None cftime: 1.0.0b1 PseudonetCDF: None rasterio: None iris: 2.2.0 bottleneck: 1.2.1 cyordereddict: None dask: 0.19.2 distributed: 1.23.2 matplotlib: 2.2.2 cartopy: 0.16.0 seaborn: 0.9.0 setuptools: 40.4.3 pip: 10.0.1 conda: None pytest: 3.8.1 IPython: 7.0.1 sphinx: 1.8.1
I know that I can change the time to a standard calendar without performing the manual calculations I did but I would like to know how are my calculations modifying the dataset such that netcdf4 backend doesn't recognise ['units'] of time anymore?","{""url"": ""https://api.github.com/repos/pydata/xarray/issues/2758/reactions"", ""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,completed,13221727,issue