issues: 1196270877

This data as json

id	node_id	number	title	user	state	locked	assignee	milestone	comments	created_at	updated_at	closed_at	author_association	active_lock_reason	draft	pull_request	body	reactions	performed_via_github_app	state_reason	repo	type
1196270877	I_kwDOAMm_X85HTaUd	6453	In a specific case, `decode_cf` adds encoding dtype that breaks `to_netcdf`	3487237	closed	0			3	2022-04-07T16:09:14Z	2022-04-18T15:29:19Z	2022-04-18T15:29:19Z	NONE				What happened? Though the time variable in the two example datasets, `ds1` and `ds3`, looks the same, the encoding is actually different and that makes it so the Dataset cannot be saved to netcdf. The encoding is added by using `decode_cf` — but only sometimes. In the case of `ds3` below, the encoding of dtype `'<M8[ns]'` is the problem and is added in that case (`open_mfdataset` with 2 files) with `decode_cf` but is not added in the case of `ds1` (`open_mfdataset` with 1 file in a list). When the "encoding dtype" is float, everything works ok (shown in case `ds2`) — again even though the times in both cases are visually appearing as datetime64-interpreted strings of dates and appear to be essentially the same setup. What did you expect to happen? These situations should all work and save to netcdf. Minimal Complete Verifiable Example ```Python import pandas as pd import xarray as xr tod = pd.Timestamp.today() locs = [tod.strftime('https://opendap.co-ops.nos.noaa.gov/thredds/dodsC/NOAA/CBOFS/MODELS/%Y/%m/%d/nos.cbofs.regulargrid.n001.%Y%m%d.t00z.nc'), tod.strftime('https://opendap.co-ops.nos.noaa.gov/thredds/dodsC/NOAA/CBOFS/MODELS/%Y/%m/%d/nos.cbofs.regulargrid.n002.%Y%m%d.t00z.nc')] THIS WORKS: using `open_mfdataset` with 1 file ds1 = xr.open_mfdataset([locs[0]]) print('DATASET1: ', ds1['ocean_time'].attrs, ds1['ocean_time'].encoding) print(ds1['ocean_time'].dtype) print(xr.decode_cf(ds1).ocean_time.encoding) xr.decode_cf(ds1).to_netcdf('test1.nc') THIS WORKS: using `open_mfdataset` with 2 files but with times not decoded ds2 = xr.open_mfdataset(locs, decode_times=False) print('\nDATASET2: ', ds2['ocean_time'].attrs, ds2['ocean_time'].encoding) print(ds2['ocean_time'].dtype) print(xr.decode_cf(ds2).ocean_time.encoding) xr.decode_cf(ds2).to_netcdf('test2.nc') THIS DOES NOT WORK: using `open_mfdataset` with 2 files with times decoded ds3 = xr.open_mfdataset(locs) print('\nDATASET3: ', ds3['ocean_time'].attrs, ds3['ocean_time'].encoding) print(ds3['ocean_time'].dtype) print(xr.decode_cf(ds3).ocean_time.encoding) xr.decode_cf(ds3).to_netcdf('test3.nc') ``` Relevant log output ```Python DATASET1: {'long_name': 'time since initialization', 'field': 'time, scalar, series'} {'source': 'https://opendap.co-ops.nos.noaa.gov/thredds/dodsC/NOAA/CBOFS/MODELS/2022/04/07/nos.cbofs.regulargrid.n001.20220407.t00z.nc', 'original_shape': (1,), 'dtype': dtype('float64'), 'units': 'seconds since 2016-01-01 00:00:00', 'calendar': 'gregorian'} datetime64[ns] {'source': 'https://opendap.co-ops.nos.noaa.gov/thredds/dodsC/NOAA/CBOFS/MODELS/2022/04/07/nos.cbofs.regulargrid.n001.20220407.t00z.nc', 'original_shape': (1,), 'dtype': dtype('float64'), 'units': 'seconds since 2016-01-01 00:00:00', 'calendar': 'gregorian'} DATASET2: {'long_name': 'time since initialization', 'units': 'seconds since 2016-01-01 00:00:00', 'calendar': 'gregorian', 'field': 'time, scalar, series'} {} float64 {'units': 'seconds since 2016-01-01 00:00:00', 'calendar': 'gregorian', 'dtype': dtype('float64')} DATASET3: {'long_name': 'time since initialization', 'field': 'time, scalar, series'} {} datetime64[ns] {'dtype': dtype('<M8[ns]')} ValueError Traceback (most recent call last) Input In [9], in <module> 24 print(ds3['ocean_time'].dtype) 25 print(xr.decode_cf(ds3).ocean_time.encoding) ---> 26 xr.decode_cf(ds3).to_netcdf('test3.nc') File ~/miniconda3/envs/model_catalogs/lib/python3.8/site-packages/xarray/core/dataset.py:1900, in Dataset.to_netcdf(self, path, mode, format, group, engine, encoding, unlimited_dims, compute, invalid_netcdf) 1897 encoding = {} 1898 from ..backends.api import to_netcdf -> 1900 return to_netcdf( 1901 self, 1902 path, 1903 mode, 1904 format=format, 1905 group=group, 1906 engine=engine, 1907 encoding=encoding, 1908 unlimited_dims=unlimited_dims, 1909 compute=compute, 1910 invalid_netcdf=invalid_netcdf, 1911 ) File ~/miniconda3/envs/model_catalogs/lib/python3.8/site-packages/xarray/backends/api.py:1072, in to_netcdf(dataset, path_or_file, mode, format, group, engine, encoding, unlimited_dims, compute, multifile, invalid_netcdf) 1067 # TODO: figure out how to refactor this logic (here and in save_mfdataset) 1068 # to avoid this mess of conditionals 1069 try: 1070 # TODO: allow this work (setting up the file for writing array data) 1071 # to be parallelized with dask -> 1072 dump_to_store( 1073 dataset, store, writer, encoding=encoding, unlimited_dims=unlimited_dims 1074 ) 1075 if autoclose: 1076 store.close() File ~/miniconda3/envs/model_catalogs/lib/python3.8/site-packages/xarray/backends/api.py:1119, in dump_to_store(dataset, store, writer, encoder, encoding, unlimited_dims) 1116 if encoder: 1117 variables, attrs = encoder(variables, attrs) -> 1119 store.store(variables, attrs, check_encoding, writer, unlimited_dims=unlimited_dims) File ~/miniconda3/envs/model_catalogs/lib/python3.8/site-packages/xarray/backends/common.py:265, in AbstractWritableDataStore.store(self, variables, attributes, check_encoding_set, writer, unlimited_dims) 263 self.set_attributes(attributes) 264 self.set_dimensions(variables, unlimited_dims=unlimited_dims) --> 265 self.set_variables( 266 variables, check_encoding_set, writer, unlimited_dims=unlimited_dims 267 ) File ~/miniconda3/envs/model_catalogs/lib/python3.8/site-packages/xarray/backends/common.py:303, in AbstractWritableDataStore.set_variables(self, variables, check_encoding_set, writer, unlimited_dims) 301 name = _encode_variable_name(vn) 302 check = vn in check_encoding_set --> 303 target, source = self.prepare_variable( 304 name, v, check, unlimited_dims=unlimited_dims 305 ) 307 writer.add(source, target) File ~/miniconda3/envs/model_catalogs/lib/python3.8/site-packages/xarray/backends/netCDF4_.py:464, in NetCDF4DataStore.prepare_variable(self, name, variable, check_encoding, unlimited_dims) 461 def prepare_variable( 462 self, name, variable, check_encoding=False, unlimited_dims=None 463 ): --> 464 datatype = _get_datatype( 465 variable, self.format, raise_on_invalid_encoding=check_encoding 466 ) 467 attrs = variable.attrs.copy() 469 fill_value = attrs.pop("_FillValue", None) File ~/miniconda3/envs/model_catalogs/lib/python3.8/site-packages/xarray/backends/netCDF4_.py:139, in _get_datatype(var, nc_format, raise_on_invalid_encoding) 137 def _get_datatype(var, nc_format="NETCDF4", raise_on_invalid_encoding=False): 138 if nc_format == "NETCDF4": --> 139 return _nc4_dtype(var) 140 if "dtype" in var.encoding: 141 encoded_dtype = var.encoding["dtype"] File ~/miniconda3/envs/model_catalogs/lib/python3.8/site-packages/xarray/backends/netCDF4_.py:160, in _nc4_dtype(var) 158 dtype = var.dtype 159 else: --> 160 raise ValueError(f"unsupported dtype for netCDF4 variable: {var.dtype}") 161 return dtype ValueError: unsupported dtype for netCDF4 variable: datetime64[ns] ``` Anything else we need to know? I tried turning off all the keyword-based inputs in `decode_cf` and the situation is the same as shown above. Currently my workaround is to first check to see if the datetimes in the Dataset are float and need to be decoded (because they were read in with `decode_times=False`), and only then are they decoded with `decode_cf`. In that case, the problem doesn't arise. (This is the case of `ds2`.) Environment INSTALLED VERSIONS commit: None python: 3.8.0 \| packaged by conda-forge \| (default, Nov 22 2019, 19:11:19) [Clang 9.0.0 (tags/RELEASE_900/final)] python-bits: 64 OS: Darwin OS-release: 19.6.0 machine: x86_64 processor: i386 byteorder: little LC_ALL: None LANG: en_US.UTF-8 LOCALE: ('en_US', 'UTF-8') libhdf5: 1.12.1 libnetcdf: 4.8.1 xarray: 0.21.1 pandas: 1.4.1 numpy: 1.22.2 scipy: 1.8.0 netCDF4: 1.5.8 pydap: None h5netcdf: 0.13.1 h5py: 3.6.0 Nio: None zarr: 2.11.0 cftime: 1.5.2 nc_time_axis: None PseudoNetCDF: None rasterio: None cfgrib: None iris: None bottleneck: None dask: 2022.02.0 distributed: 2022.02.0 matplotlib: 3.5.1 cartopy: 0.20.2 seaborn: None numbagg: None fsspec: 2022.01.0 cupy: None pint: None sparse: None setuptools: 59.8.0 pip: 22.0.3 conda: None pytest: 7.0.1 IPython: 8.0.1 sphinx: 4.4.0	{ "url": "https://api.github.com/repos/pydata/xarray/issues/6453/reactions", "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }		completed	13221727	issue

Links from other tables

1 row from issues_id in issues_labels
3 rows from issue in issue_comments