html_url,issue_url,id,node_id,user,created_at,updated_at,author_association,body,reactions,performed_via_github_app,issue https://github.com/pydata/xarray/pull/2751#issuecomment-558605566,https://api.github.com/repos/pydata/xarray/issues/2751,558605566,MDEyOklzc3VlQ29tbWVudDU1ODYwNTU2Ng==,10050469,2019-11-26T12:23:44Z,2019-11-26T12:23:44Z,MEMBER,"I've noted that the very recent ERA5 data is also cast to float32 at decoding, although the scale and offset params are float64. I think it would be a very valuable addition to xarray to do this properly: @daoudjahdou are you still willing to work on this PR? Otherwise I'll try to pick this up when time permits.","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,407746874 https://github.com/pydata/xarray/pull/2751#issuecomment-495106994,https://api.github.com/repos/pydata/xarray/issues/2751,495106994,MDEyOklzc3VlQ29tbWVudDQ5NTEwNjk5NA==,971382,2019-05-23T07:48:39Z,2019-05-23T07:48:39Z,NONE,"@shoyer i've tested the solution provided, it works like a charm with my tests however many tests are broken on test_backends.py with cases where we lose precision i'll give you more detail.","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,407746874 https://github.com/pydata/xarray/pull/2751#issuecomment-480687178,https://api.github.com/repos/pydata/xarray/issues/2751,480687178,MDEyOklzc3VlQ29tbWVudDQ4MDY4NzE3OA==,1217238,2019-04-08T05:19:47Z,2019-04-08T06:00:03Z,MEMBER,"> Correct me if i'm wrong but this is a bit hard to do without knowing max,min values to avoid overlapping > Evaluating those could break performance if only used for encoding and decoding. Right, we definitely don't want to inspect the data to verify if it fits. ~~I think we will will need explicitly checks for `isinstance(x, (int, float))` (for `scale_factor` and `add_offset`) to identify these cases in the `encode()` method. Let me try to come up with a concrete suggestion...~~ EDIT: See my suggestion below. I don't think we want these explicit checks, after all.","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,407746874 https://github.com/pydata/xarray/pull/2751#issuecomment-479831812,https://api.github.com/repos/pydata/xarray/issues/2751,479831812,MDEyOklzc3VlQ29tbWVudDQ3OTgzMTgxMg==,971382,2019-04-04T09:55:14Z,2019-04-04T09:55:14Z,NONE,"@shoyer sorry for the delayed response. `dtypes.result_type(1, np.float32(1))` returns `dtype('float64')` That's what makes this behaviour for Python's `int` and `float` Keeping the consistency would then require testing if `scale_factor*var_dtype + add_offset` fit in `var_dtype` in case of Python's `int` and `float`. Correct me if i'm wrong but this is a bit hard to do without knowing max,min values to avoid overlapping Evaluating those could break performance if only used for encoding and decoding. Do have any idea how this could be acheived ? or is it simpler to keep it as it is ? ","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,407746874 https://github.com/pydata/xarray/pull/2751#issuecomment-473496064,https://api.github.com/repos/pydata/xarray/issues/2751,473496064,MDEyOklzc3VlQ29tbWVudDQ3MzQ5NjA2NA==,1217238,2019-03-16T03:50:00Z,2019-03-16T03:50:00Z,MEMBER,"> @shoyer do you mean that we consider that by default when we deal with Python's `int` and `float` we cast them to `np.int64` and `np.float64` ? Currently we cast to `int32`/`float64` for netCDF3 files (which don't support `int64`) and `int64`/`float64` for netCDF4 files. That default behavior doesn't really make sense if the user provided builtin Python numbers for these attributes. For example, if I write something like the following, the data for `x` should probably decode to float32: ```python ds = xarray.Dataset({'x': np.float32(0)}) encoding = {'x': {'scale_factor': 1, 'add_offset': 1, 'dtype': 'int8'}} ds.to_netcdf('test.nc', encoding=encoding) ``` With the current implementation in this PR, I think the data will come back as float64. To get something that will decode as float64 without the original data being float64, the user should need to intentionally choose dtypes for `scale_factor` and `add_offsert`, e.g., ```python ds = xarray.Dataset({'x': np.float32(0)}) encoding = {'x': {'scale_factor': np.float64(1), 'add_offset': np.float64(1), 'dtype': 'int8'}} ds.to_netcdf('test.nc', encoding=encoding) ``` Does this make sense to you? From my perspective, two goals for this change should be: 1. Adhere to CF conventions for deciding how to decode data from disk. 2. Provide a consistent interface for xarray users, so that by default data is restored in the same form that it was written. ","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,407746874 https://github.com/pydata/xarray/pull/2751#issuecomment-473296942,https://api.github.com/repos/pydata/xarray/issues/2751,473296942,MDEyOklzc3VlQ29tbWVudDQ3MzI5Njk0Mg==,971382,2019-03-15T14:00:34Z,2019-03-15T14:00:34Z,NONE,@shoyer do you mean that we consider that by default when we deal with Python's `int` and `float` we cast them to `np.int64` and `np.float64` ?,"{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,407746874 https://github.com/pydata/xarray/pull/2751#issuecomment-473165754,https://api.github.com/repos/pydata/xarray/issues/2751,473165754,MDEyOklzc3VlQ29tbWVudDQ3MzE2NTc1NA==,1217238,2019-03-15T05:42:18Z,2019-03-15T05:42:18Z,MEMBER,"@daoudjahdou thanks for sticking with this. I'm concerned about how lose the ""roundtrip"" invariant in some edge cases, where the user did not specifically indicate the desired dtype but rather used a Python type. Right now xarray seems to use inconsistent rules for casting attribute types, based on the netCDF backend: ``` In [8]: for engine in ['scipy', 'netcdf4', 'h5netcdf']: ...: ds = xarray.Dataset({'x': np.float32(0)}) ...: encoding = {'x': {'scale_factor': 1, 'add_offset': 1}} ...: ds.to_netcdf(f'test-{engine}.nc', engine=engine, encoding=encoding) ...: In [9]: ! ncdump -h test-scipy.nc netcdf test-scipy { variables: float x ; x:add_offset = 1 ; x:scale_factor = 1 ; x:_FillValue = NaNf ; } In [10]: ! ncdump -h test-netcdf4.nc netcdf test-netcdf4 { variables: float x ; x:_FillValue = NaNf ; x:add_offset = 1LL ; x:scale_factor = 1LL ; } In [11]: ! ncdump -h test-h5netcdf.nc netcdf test-h5netcdf { variables: float x ; x:_FillValue = NaNf ; x:add_offset = 1LL ; x:scale_factor = 1LL ; } ``` At least for `add_offset` and `scale_factor`, I think we should be more careful when determining the attribute dtype. Python's `int` and `float` don't have any intrinsic dtype. If the attributes are specified as native Python types (not NumPy types) and can be safely cast into the dtype of the data, we should probably use that instead.","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,407746874 https://github.com/pydata/xarray/pull/2751#issuecomment-470881715,https://api.github.com/repos/pydata/xarray/issues/2751,470881715,MDEyOklzc3VlQ29tbWVudDQ3MDg4MTcxNQ==,971382,2019-03-08T10:29:32Z,2019-03-08T10:29:32Z,NONE,"@shoyer tests are failing but it does'nt seem to be coming from this PR, i saw the same error on other PRs as well, my tests were working fine until i made a git pull.","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,407746874 https://github.com/pydata/xarray/pull/2751#issuecomment-468177645,https://api.github.com/repos/pydata/xarray/issues/2751,468177645,MDEyOklzc3VlQ29tbWVudDQ2ODE3NzY0NQ==,971382,2019-02-28T08:09:58Z,2019-02-28T08:09:58Z,NONE,@shoyer did you have a look at this ?,"{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,407746874 https://github.com/pydata/xarray/pull/2751#issuecomment-466290710,https://api.github.com/repos/pydata/xarray/issues/2751,466290710,MDEyOklzc3VlQ29tbWVudDQ2NjI5MDcxMA==,971382,2019-02-22T06:38:32Z,2019-02-22T06:38:59Z,NONE,"@shoyer i changed the implementation, and took into consideration your comments. now returning largest type takes place only when decoding. i added a test with all types of scale factor, add_offset and variable.","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,407746874 https://github.com/pydata/xarray/pull/2751#issuecomment-464749511,https://api.github.com/repos/pydata/xarray/issues/2751,464749511,MDEyOklzc3VlQ29tbWVudDQ2NDc0OTUxMQ==,971382,2019-02-18T14:25:00Z,2019-02-18T14:25:25Z,NONE,"@shoyer now scale_factor and add_offset are taken into account when encoding and decoding data. if none of them is present or if they're not a subtype of np.generic the old behaviour takes place.","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,407746874 https://github.com/pydata/xarray/pull/2751#issuecomment-463097712,https://api.github.com/repos/pydata/xarray/issues/2751,463097712,MDEyOklzc3VlQ29tbWVudDQ2MzA5NzcxMg==,971382,2019-02-13T08:01:39Z,2019-02-13T08:01:39Z,NONE,@shoyer yes sure i'll update the pull request with the mentioned modifications.,"{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,407746874 https://github.com/pydata/xarray/pull/2751#issuecomment-462876031,https://api.github.com/repos/pydata/xarray/issues/2751,462876031,MDEyOklzc3VlQ29tbWVudDQ2Mjg3NjAzMQ==,1217238,2019-02-12T18:24:51Z,2019-02-12T18:24:51Z,MEMBER,@daoudjahdou what do you think of @magau's proposed solution over in https://github.com/pydata/xarray/issues/2304#issuecomment-462592638? Checking the dtypes of of `add_offset` and `scale_factor` might be a better alternative than requiring setting a custom option.,"{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,407746874 https://github.com/pydata/xarray/pull/2751#issuecomment-462681656,https://api.github.com/repos/pydata/xarray/issues/2751,462681656,MDEyOklzc3VlQ29tbWVudDQ2MjY4MTY1Ng==,971382,2019-02-12T09:21:59Z,2019-02-12T09:22:12Z,NONE,"@shoyer the logic is now propagated down to _choose_float_dtype inside CFScaleOffsetCoder, please let me know what you think.","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,407746874 https://github.com/pydata/xarray/pull/2751#issuecomment-461856173,https://api.github.com/repos/pydata/xarray/issues/2751,461856173,MDEyOklzc3VlQ29tbWVudDQ2MTg1NjE3Mw==,1217238,2019-02-08T16:15:11Z,2019-02-08T16:15:11Z,MEMBER,@daoudjahdou sorry I missed your comment over in #2304. Threading the new parameter down though other functions is definitely the preferred approach here -- mutating global variables makes it much harder to follow control flow. Though I would probably stop short of modifying `maybe_promote` and would put this logic inside `CFMaskCoder` and/or `CFScaleOffsetCoder` in xarray.coding.variables.,"{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,407746874 https://github.com/pydata/xarray/pull/2751#issuecomment-461761945,https://api.github.com/repos/pydata/xarray/issues/2751,461761945,MDEyOklzc3VlQ29tbWVudDQ2MTc2MTk0NQ==,971382,2019-02-08T10:41:34Z,2019-02-08T10:41:34Z,NONE,@shoyer did you have a look at this ?,"{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,407746874