html_url,issue_url,id,node_id,user,created_at,updated_at,author_association,body,reactions,performed_via_github_app,issue https://github.com/pydata/xarray/pull/2751#issuecomment-480687178,https://api.github.com/repos/pydata/xarray/issues/2751,480687178,MDEyOklzc3VlQ29tbWVudDQ4MDY4NzE3OA==,1217238,2019-04-08T05:19:47Z,2019-04-08T06:00:03Z,MEMBER,"> Correct me if i'm wrong but this is a bit hard to do without knowing max,min values to avoid overlapping > Evaluating those could break performance if only used for encoding and decoding. Right, we definitely don't want to inspect the data to verify if it fits. ~~I think we will will need explicitly checks for `isinstance(x, (int, float))` (for `scale_factor` and `add_offset`) to identify these cases in the `encode()` method. Let me try to come up with a concrete suggestion...~~ EDIT: See my suggestion below. I don't think we want these explicit checks, after all.","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,407746874 https://github.com/pydata/xarray/pull/2751#issuecomment-473496064,https://api.github.com/repos/pydata/xarray/issues/2751,473496064,MDEyOklzc3VlQ29tbWVudDQ3MzQ5NjA2NA==,1217238,2019-03-16T03:50:00Z,2019-03-16T03:50:00Z,MEMBER,"> @shoyer do you mean that we consider that by default when we deal with Python's `int` and `float` we cast them to `np.int64` and `np.float64` ? Currently we cast to `int32`/`float64` for netCDF3 files (which don't support `int64`) and `int64`/`float64` for netCDF4 files. That default behavior doesn't really make sense if the user provided builtin Python numbers for these attributes. For example, if I write something like the following, the data for `x` should probably decode to float32: ```python ds = xarray.Dataset({'x': np.float32(0)}) encoding = {'x': {'scale_factor': 1, 'add_offset': 1, 'dtype': 'int8'}} ds.to_netcdf('test.nc', encoding=encoding) ``` With the current implementation in this PR, I think the data will come back as float64. To get something that will decode as float64 without the original data being float64, the user should need to intentionally choose dtypes for `scale_factor` and `add_offsert`, e.g., ```python ds = xarray.Dataset({'x': np.float32(0)}) encoding = {'x': {'scale_factor': np.float64(1), 'add_offset': np.float64(1), 'dtype': 'int8'}} ds.to_netcdf('test.nc', encoding=encoding) ``` Does this make sense to you? From my perspective, two goals for this change should be: 1. Adhere to CF conventions for deciding how to decode data from disk. 2. Provide a consistent interface for xarray users, so that by default data is restored in the same form that it was written. ","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,407746874 https://github.com/pydata/xarray/pull/2751#issuecomment-473165754,https://api.github.com/repos/pydata/xarray/issues/2751,473165754,MDEyOklzc3VlQ29tbWVudDQ3MzE2NTc1NA==,1217238,2019-03-15T05:42:18Z,2019-03-15T05:42:18Z,MEMBER,"@daoudjahdou thanks for sticking with this. I'm concerned about how lose the ""roundtrip"" invariant in some edge cases, where the user did not specifically indicate the desired dtype but rather used a Python type. Right now xarray seems to use inconsistent rules for casting attribute types, based on the netCDF backend: ``` In [8]: for engine in ['scipy', 'netcdf4', 'h5netcdf']: ...: ds = xarray.Dataset({'x': np.float32(0)}) ...: encoding = {'x': {'scale_factor': 1, 'add_offset': 1}} ...: ds.to_netcdf(f'test-{engine}.nc', engine=engine, encoding=encoding) ...: In [9]: ! ncdump -h test-scipy.nc netcdf test-scipy { variables: float x ; x:add_offset = 1 ; x:scale_factor = 1 ; x:_FillValue = NaNf ; } In [10]: ! ncdump -h test-netcdf4.nc netcdf test-netcdf4 { variables: float x ; x:_FillValue = NaNf ; x:add_offset = 1LL ; x:scale_factor = 1LL ; } In [11]: ! ncdump -h test-h5netcdf.nc netcdf test-h5netcdf { variables: float x ; x:_FillValue = NaNf ; x:add_offset = 1LL ; x:scale_factor = 1LL ; } ``` At least for `add_offset` and `scale_factor`, I think we should be more careful when determining the attribute dtype. Python's `int` and `float` don't have any intrinsic dtype. If the attributes are specified as native Python types (not NumPy types) and can be safely cast into the dtype of the data, we should probably use that instead.","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,407746874 https://github.com/pydata/xarray/pull/2751#issuecomment-462876031,https://api.github.com/repos/pydata/xarray/issues/2751,462876031,MDEyOklzc3VlQ29tbWVudDQ2Mjg3NjAzMQ==,1217238,2019-02-12T18:24:51Z,2019-02-12T18:24:51Z,MEMBER,@daoudjahdou what do you think of @magau's proposed solution over in https://github.com/pydata/xarray/issues/2304#issuecomment-462592638? Checking the dtypes of of `add_offset` and `scale_factor` might be a better alternative than requiring setting a custom option.,"{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,407746874 https://github.com/pydata/xarray/pull/2751#issuecomment-461856173,https://api.github.com/repos/pydata/xarray/issues/2751,461856173,MDEyOklzc3VlQ29tbWVudDQ2MTg1NjE3Mw==,1217238,2019-02-08T16:15:11Z,2019-02-08T16:15:11Z,MEMBER,@daoudjahdou sorry I missed your comment over in #2304. Threading the new parameter down though other functions is definitely the preferred approach here -- mutating global variables makes it much harder to follow control flow. Though I would probably stop short of modifying `maybe_promote` and would put this logic inside `CFMaskCoder` and/or `CFScaleOffsetCoder` in xarray.coding.variables.,"{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,407746874