home / github / issues

Menu
  • GraphQL API
  • Search all tables

issues: 1196270877

This data as json

id node_id number title user state locked assignee milestone comments created_at updated_at closed_at author_association active_lock_reason draft pull_request body reactions performed_via_github_app state_reason repo type
1196270877 I_kwDOAMm_X85HTaUd 6453 In a specific case, `decode_cf` adds encoding dtype that breaks `to_netcdf` 3487237 closed 0     3 2022-04-07T16:09:14Z 2022-04-18T15:29:19Z 2022-04-18T15:29:19Z NONE      

What happened?

Though the time variable in the two example datasets, ds1 and ds3, looks the same, the encoding is actually different and that makes it so the Dataset cannot be saved to netcdf. The encoding is added by using decode_cf — but only sometimes. In the case of ds3 below, the encoding of dtype '<M8[ns]' is the problem and is added in that case (open_mfdataset with 2 files) with decode_cf but is not added in the case of ds1 (open_mfdataset with 1 file in a list). When the "encoding dtype" is float, everything works ok (shown in case ds2) — again even though the times in both cases are visually appearing as datetime64-interpreted strings of dates and appear to be essentially the same setup.

What did you expect to happen?

These situations should all work and save to netcdf.

Minimal Complete Verifiable Example

```Python import pandas as pd import xarray as xr tod = pd.Timestamp.today() locs = [tod.strftime('https://opendap.co-ops.nos.noaa.gov/thredds/dodsC/NOAA/CBOFS/MODELS/%Y/%m/%d/nos.cbofs.regulargrid.n001.%Y%m%d.t00z.nc'), tod.strftime('https://opendap.co-ops.nos.noaa.gov/thredds/dodsC/NOAA/CBOFS/MODELS/%Y/%m/%d/nos.cbofs.regulargrid.n002.%Y%m%d.t00z.nc')]

THIS WORKS: using open_mfdataset with 1 file

ds1 = xr.open_mfdataset([locs[0]]) print('DATASET1: ', ds1['ocean_time'].attrs, ds1['ocean_time'].encoding) print(ds1['ocean_time'].dtype) print(xr.decode_cf(ds1).ocean_time.encoding) xr.decode_cf(ds1).to_netcdf('test1.nc')

THIS WORKS: using open_mfdataset with 2 files but with times not decoded

ds2 = xr.open_mfdataset(locs, decode_times=False) print('\nDATASET2: ', ds2['ocean_time'].attrs, ds2['ocean_time'].encoding) print(ds2['ocean_time'].dtype) print(xr.decode_cf(ds2).ocean_time.encoding) xr.decode_cf(ds2).to_netcdf('test2.nc')

THIS DOES NOT WORK: using open_mfdataset with 2 files with times decoded

ds3 = xr.open_mfdataset(locs) print('\nDATASET3: ', ds3['ocean_time'].attrs, ds3['ocean_time'].encoding) print(ds3['ocean_time'].dtype) print(xr.decode_cf(ds3).ocean_time.encoding) xr.decode_cf(ds3).to_netcdf('test3.nc') ```

Relevant log output

```Python DATASET1: {'long_name': 'time since initialization', 'field': 'time, scalar, series'} {'source': 'https://opendap.co-ops.nos.noaa.gov/thredds/dodsC/NOAA/CBOFS/MODELS/2022/04/07/nos.cbofs.regulargrid.n001.20220407.t00z.nc', 'original_shape': (1,), 'dtype': dtype('float64'), 'units': 'seconds since 2016-01-01 00:00:00', 'calendar': 'gregorian'} datetime64[ns] {'source': 'https://opendap.co-ops.nos.noaa.gov/thredds/dodsC/NOAA/CBOFS/MODELS/2022/04/07/nos.cbofs.regulargrid.n001.20220407.t00z.nc', 'original_shape': (1,), 'dtype': dtype('float64'), 'units': 'seconds since 2016-01-01 00:00:00', 'calendar': 'gregorian'}

DATASET2: {'long_name': 'time since initialization', 'units': 'seconds since 2016-01-01 00:00:00', 'calendar': 'gregorian', 'field': 'time, scalar, series'} {} float64 {'units': 'seconds since 2016-01-01 00:00:00', 'calendar': 'gregorian', 'dtype': dtype('float64')}

DATASET3: {'long_name': 'time since initialization', 'field': 'time, scalar, series'} {} datetime64[ns] {'dtype': dtype('<M8[ns]')}


ValueError Traceback (most recent call last) Input In [9], in <module> 24 print(ds3['ocean_time'].dtype) 25 print(xr.decode_cf(ds3).ocean_time.encoding) ---> 26 xr.decode_cf(ds3).to_netcdf('test3.nc')

File ~/miniconda3/envs/model_catalogs/lib/python3.8/site-packages/xarray/core/dataset.py:1900, in Dataset.to_netcdf(self, path, mode, format, group, engine, encoding, unlimited_dims, compute, invalid_netcdf) 1897 encoding = {} 1898 from ..backends.api import to_netcdf -> 1900 return to_netcdf( 1901 self, 1902 path, 1903 mode, 1904 format=format, 1905 group=group, 1906 engine=engine, 1907 encoding=encoding, 1908 unlimited_dims=unlimited_dims, 1909 compute=compute, 1910 invalid_netcdf=invalid_netcdf, 1911 )

File ~/miniconda3/envs/model_catalogs/lib/python3.8/site-packages/xarray/backends/api.py:1072, in to_netcdf(dataset, path_or_file, mode, format, group, engine, encoding, unlimited_dims, compute, multifile, invalid_netcdf) 1067 # TODO: figure out how to refactor this logic (here and in save_mfdataset) 1068 # to avoid this mess of conditionals 1069 try: 1070 # TODO: allow this work (setting up the file for writing array data) 1071 # to be parallelized with dask -> 1072 dump_to_store( 1073 dataset, store, writer, encoding=encoding, unlimited_dims=unlimited_dims 1074 ) 1075 if autoclose: 1076 store.close()

File ~/miniconda3/envs/model_catalogs/lib/python3.8/site-packages/xarray/backends/api.py:1119, in dump_to_store(dataset, store, writer, encoder, encoding, unlimited_dims) 1116 if encoder: 1117 variables, attrs = encoder(variables, attrs) -> 1119 store.store(variables, attrs, check_encoding, writer, unlimited_dims=unlimited_dims)

File ~/miniconda3/envs/model_catalogs/lib/python3.8/site-packages/xarray/backends/common.py:265, in AbstractWritableDataStore.store(self, variables, attributes, check_encoding_set, writer, unlimited_dims) 263 self.set_attributes(attributes) 264 self.set_dimensions(variables, unlimited_dims=unlimited_dims) --> 265 self.set_variables( 266 variables, check_encoding_set, writer, unlimited_dims=unlimited_dims 267 )

File ~/miniconda3/envs/model_catalogs/lib/python3.8/site-packages/xarray/backends/common.py:303, in AbstractWritableDataStore.set_variables(self, variables, check_encoding_set, writer, unlimited_dims) 301 name = _encode_variable_name(vn) 302 check = vn in check_encoding_set --> 303 target, source = self.prepare_variable( 304 name, v, check, unlimited_dims=unlimited_dims 305 ) 307 writer.add(source, target)

File ~/miniconda3/envs/model_catalogs/lib/python3.8/site-packages/xarray/backends/netCDF4_.py:464, in NetCDF4DataStore.prepare_variable(self, name, variable, check_encoding, unlimited_dims) 461 def prepare_variable( 462 self, name, variable, check_encoding=False, unlimited_dims=None 463 ): --> 464 datatype = _get_datatype( 465 variable, self.format, raise_on_invalid_encoding=check_encoding 466 ) 467 attrs = variable.attrs.copy() 469 fill_value = attrs.pop("_FillValue", None)

File ~/miniconda3/envs/model_catalogs/lib/python3.8/site-packages/xarray/backends/netCDF4_.py:139, in _get_datatype(var, nc_format, raise_on_invalid_encoding) 137 def _get_datatype(var, nc_format="NETCDF4", raise_on_invalid_encoding=False): 138 if nc_format == "NETCDF4": --> 139 return _nc4_dtype(var) 140 if "dtype" in var.encoding: 141 encoded_dtype = var.encoding["dtype"]

File ~/miniconda3/envs/model_catalogs/lib/python3.8/site-packages/xarray/backends/netCDF4_.py:160, in _nc4_dtype(var) 158 dtype = var.dtype 159 else: --> 160 raise ValueError(f"unsupported dtype for netCDF4 variable: {var.dtype}") 161 return dtype

ValueError: unsupported dtype for netCDF4 variable: datetime64[ns] ```

Anything else we need to know?

I tried turning off all the keyword-based inputs in decode_cf and the situation is the same as shown above. Currently my workaround is to first check to see if the datetimes in the Dataset are float and need to be decoded (because they were read in with decode_times=False), and only then are they decoded with decode_cf. In that case, the problem doesn't arise. (This is the case of ds2.)

Environment

INSTALLED VERSIONS

commit: None python: 3.8.0 | packaged by conda-forge | (default, Nov 22 2019, 19:11:19) [Clang 9.0.0 (tags/RELEASE_900/final)] python-bits: 64 OS: Darwin OS-release: 19.6.0 machine: x86_64 processor: i386 byteorder: little LC_ALL: None LANG: en_US.UTF-8 LOCALE: ('en_US', 'UTF-8') libhdf5: 1.12.1 libnetcdf: 4.8.1

xarray: 0.21.1 pandas: 1.4.1 numpy: 1.22.2 scipy: 1.8.0 netCDF4: 1.5.8 pydap: None h5netcdf: 0.13.1 h5py: 3.6.0 Nio: None zarr: 2.11.0 cftime: 1.5.2 nc_time_axis: None PseudoNetCDF: None rasterio: None cfgrib: None iris: None bottleneck: None dask: 2022.02.0 distributed: 2022.02.0 matplotlib: 3.5.1 cartopy: 0.20.2 seaborn: None numbagg: None fsspec: 2022.01.0 cupy: None pint: None sparse: None setuptools: 59.8.0 pip: 22.0.3 conda: None pytest: 7.0.1 IPython: 8.0.1 sphinx: 4.4.0

{
    "url": "https://api.github.com/repos/pydata/xarray/issues/6453/reactions",
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  completed 13221727 issue

Links from other tables

  • 1 row from issues_id in issues_labels
  • 3 rows from issue in issue_comments
Powered by Datasette · Queries took 0.747ms · About: xarray-datasette