home / github / issues

Menu
  • GraphQL API
  • Search all tables

issues: 1396889729

This data as json

id node_id number title user state locked assignee milestone comments created_at updated_at closed_at author_association active_lock_reason draft pull_request body reactions performed_via_github_app state_reason repo type
1396889729 I_kwDOAMm_X85TQtiB 7127 Document that `Variable.encoding` is ignored if encoding is given in `to_netcdf` 43613877 closed 0     1 2022-10-04T21:57:48Z 2023-07-21T21:57:41Z 2023-07-21T21:57:41Z CONTRIBUTOR      

What happened?

With a change from xarray version 2022.06.0 to 2022.09.0 the following output is no longer written as float32 but float64.

What did you expect to happen?

I expected the output to have the same dtype.

Minimal Complete Verifiable Example

Python import xarray as xr ds = xr.tutorial.load_dataset("eraint_uvz") encoding = {'z':{'zlib':True} ds.z.to_netcdf("compressed.nc", encoding=encoding)

MVCE confirmation

  • [X] Minimal example — the example is as focused as reasonably possible to demonstrate the underlying issue in xarray.
  • [X] Complete example — the example is self-contained, including all data and the text of any traceback.
  • [X] Verifiable example — the example copy & pastes into an IPython prompt or Binder notebook, returning the result.
  • [X] New issue — a search of GitHub Issues suggests this is not a duplicate.

Relevant log output

```Python

xarray version == 2022.06.0

netcdf compressed { dimensions: longitude = 480 ; latitude = 241 ; level = 3 ; month = 2 ; variables: float longitude(longitude) ; longitude:_FillValue = NaNf ; longitude:units = "degrees_east" ; longitude:long_name = "longitude" ; float latitude(latitude) ; latitude:_FillValue = NaNf ; latitude:units = "degrees_north" ; latitude:long_name = "latitude" ; int level(level) ; level:units = "millibars" ; level:long_name = "pressure_level" ; int month(month) ; float z(month, level, latitude, longitude) ; z:_FillValue = NaNf ; z:number_of_significant_digits = 5 ; z:units = "m2 s-2" ; z:long_name = "Geopotential" ; z:standard_name = "geopotential" ;

xarray version == 2022.09.0

netcdf compressed { dimensions: longitude = 480 ; latitude = 241 ; level = 3 ; month = 2 ; variables: float longitude(longitude) ; longitude:_FillValue = NaNf ; longitude:units = "degrees_east" ; longitude:long_name = "longitude" ; float latitude(latitude) ; latitude:_FillValue = NaNf ; latitude:units = "degrees_north" ; latitude:long_name = "latitude" ; int level(level) ; level:units = "millibars" ; level:long_name = "pressure_level" ; int month(month) ; double z(month, level, latitude, longitude) ; z:_FillValue = NaN ; z:number_of_significant_digits = 5 ; z:units = "m2 s-2" ; z:long_name = "Geopotential" ; z:standard_name = "geopotential" ; ```

Anything else we need to know?

In addition to the change of dtype from float to double, I wonder if both outputs should actually rather be int16, because this is the dtype of the original dataset:

```python

import xarray as xr ds = xr.tutorial.load_dataset("eraint_uvz") ds.z.encoding {'source': '.../.cache/xarray_tutorial_data/e4bb6ebf67663eeab3ff30beae6a5acf-eraint_uvz.nc', 'original_shape': (2, 3, 241, 480), 'dtype': dtype('int16'), '_FillValue': nan, 'scale_factor': -1.7250274674967954, 'add_offset': 66825.5} ds.z.to_netcdf("original.nc") ```

netcdf original { dimensions: longitude = 480 ; latitude = 241 ; level = 3 ; month = 2 ; variables: float longitude(longitude) ; longitude:_FillValue = NaNf ; longitude:units = "degrees_east" ; longitude:long_name = "longitude" ; float latitude(latitude) ; latitude:_FillValue = NaNf ; latitude:units = "degrees_north" ; latitude:long_name = "latitude" ; int level(level) ; level:units = "millibars" ; level:long_name = "pressure_level" ; int month(month) ; short z(month, level, latitude, longitude) ; z:_FillValue = 0s ; z:number_of_significant_digits = 5 ; z:units = "m**2 s**-2" ; z:long_name = "Geopotential" ; z:standard_name = "geopotential" ; z:add_offset = 66825.5 ; z:scale_factor = -1.7250274674968 ;

Sorry for mixing an issue with a question, but why is the add_offset and scale_factor applied and the values saved as float32/float64 in case encoding is set? I guess encoding in to_netcdf is overwriting the initial encoding, because

python ds.z.to_netcdf("test_w_offset.nc", encoding={"z":{"add_offset":66825.5, "scale_factor":-1.7250274674968, "dtype":'int16'}}) produces the expected output that matches the original one. So I imagine, a good way of setting the output encoding is currently something like python ds.to_netcdf("compressed.nc", encoding={v:{**ds.v.encoding, "zlib":True} for v in ds.data_vars}) in case an encoding similar to the input encoding - with additional parameters (e.g. 'zlib') - is requested.

Environment

INSTALLED VERSIONS ------------------ commit: None python: 3.10.6 | packaged by conda-forge | (main, Aug 22 2022, 20:35:26) [GCC 10.4.0] python-bits: 64 OS: Linux OS-release: 4.18.0-305.25.1.el8_4.x86_64 machine: x86_64 processor: x86_64 byteorder: little LC_ALL: None LANG: en_US LOCALE: ('en_US', 'UTF-8') libhdf5: 1.12.2 libnetcdf: 4.8.1 xarray: 2022.6.0. # or 2022.9.0 pandas: 1.5.0 numpy: 1.23.3 scipy: None netCDF4: 1.6.1 pydap: None h5netcdf: None h5py: None Nio: None zarr: 2.13.2 cftime: 1.6.2 nc_time_axis: None PseudoNetCDF: None rasterio: None cfgrib: None iris: None bottleneck: None dask: None distributed: None matplotlib: None cartopy: None seaborn: None numbagg: None fsspec: None cupy: None pint: None sparse: None flox: None numpy_groupies: None setuptools: 65.4.1 pip: 22.2.2 conda: None pytest: None IPython: 8.3.0 sphinx: None
{
    "url": "https://api.github.com/repos/pydata/xarray/issues/7127/reactions",
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  completed 13221727 issue

Links from other tables

  • 2 rows from issues_id in issues_labels
  • 1 row from issue in issue_comments
Powered by Datasette · Queries took 1.659ms · About: xarray-datasette