issue_comments
12 rows where author_association = "CONTRIBUTOR" and issue = 343659822 sorted by updated_at descending
This data as json, CSV (advanced)
Suggested facets: reactions, created_at (date), updated_at (date)
issue 1
- float32 instead of float64 when decoding int16 with scale_factor netcdf var using xarray · 12 ✖
id | html_url | issue_url | node_id | user | created_at | updated_at ▲ | author_association | body | reactions | performed_via_github_app | issue |
---|---|---|---|---|---|---|---|---|---|---|---|
1201464999 | https://github.com/pydata/xarray/issues/2304#issuecomment-1201464999 | https://api.github.com/repos/pydata/xarray/issues/2304 | IC_kwDOAMm_X85HnOan | mankoff 145117 | 2022-08-01T16:56:01Z | 2022-08-01T16:56:01Z | CONTRIBUTOR | Packing Qs
Unpacking Qs
|
{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
float32 instead of float64 when decoding int16 with scale_factor netcdf var using xarray 343659822 | |
1201461626 | https://github.com/pydata/xarray/issues/2304#issuecomment-1201461626 | https://api.github.com/repos/pydata/xarray/issues/2304 | IC_kwDOAMm_X85HnNl6 | mankoff 145117 | 2022-08-01T16:52:47Z | 2022-08-01T16:52:47Z | CONTRIBUTOR |
I think this means double is advised? If so, this should be stated. Should be rephrased to advise what to do (if there is one or only a few choices) rather than what not to do, or at least include that if not replacing current wording. |
{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
float32 instead of float64 when decoding int16 with scale_factor netcdf var using xarray 343659822 | |
1200627783 | https://github.com/pydata/xarray/issues/2304#issuecomment-1200627783 | https://api.github.com/repos/pydata/xarray/issues/2304 | IC_kwDOAMm_X85HkCBH | mankoff 145117 | 2022-08-01T02:49:28Z | 2022-08-01T05:55:15Z | CONTRIBUTOR | Current algorithm
Due to calling bug,
Here I call the function twice, once with ```python import numpy as np def _choose_float_dtype(dtype, has_offset): if dtype.itemsize <= 4 and np.issubdtype(dtype, np.floating): return np.float32 if dtype.itemsize <= 2 and np.issubdtype(dtype, np.integer): if not has_offset: return np.float32 return np.float64 generic typesfor dtype in [np.byte, np.ubyte, np.short, np.ushort, np.intc, np.uintc, np.int_, np.uint, np.longlong, np.ulonglong, np.half, np.float16, np.single, np.double, np.longdouble, np.csingle, np.cdouble, np.clongdouble, np.int8, np.int16, np.int32, np.int64, np.uint8, np.uint16, np.uint32, np.uint64, np.float16, np.float32, np.float64]: print("|", dtype, "|", _choose_float_dtype(np.dtype(dtype), False), "|", _choose_float_dtype(np.dtype(dtype), True), "|") ``` | Input | Output as called | Output as written | |-----------------------------|---------------------------|--------------------------| | <class 'numpy.int8'> | <class 'numpy.float32'> | <class 'numpy.float64'> | | <class 'numpy.uint8'> | <class 'numpy.float32'> | <class 'numpy.float64'> | | <class 'numpy.int16'> | <class 'numpy.float32'> | <class 'numpy.float64'> | | <class 'numpy.uint16'> | <class 'numpy.float32'> | <class 'numpy.float64'> | | <class 'numpy.int32'> | <class 'numpy.float64'> | <class 'numpy.float64'> | | <class 'numpy.uint32'> | <class 'numpy.float64'> | <class 'numpy.float64'> | | <class 'numpy.int64'> | <class 'numpy.float64'> | <class 'numpy.float64'> | | <class 'numpy.uint64'> | <class 'numpy.float64'> | <class 'numpy.float64'> | | <class 'numpy.longlong'> | <class 'numpy.float64'> | <class 'numpy.float64'> | | <class 'numpy.ulonglong'> | <class 'numpy.float64'> | <class 'numpy.float64'> | | <class 'numpy.float16'> | <class 'numpy.float32'> | <class 'numpy.float32'> | | <class 'numpy.float16'> | <class 'numpy.float32'> | <class 'numpy.float32'> | | <class 'numpy.float32'> | <class 'numpy.float32'> | <class 'numpy.float32'> | | <class 'numpy.float64'> | <class 'numpy.float64'> | <class 'numpy.float64'> | | <class 'numpy.float128'> | <class 'numpy.float64'> | <class 'numpy.float64'> | | <class 'numpy.complex64'> | <class 'numpy.float64'> | <class 'numpy.float64'> | | <class 'numpy.complex128'> | <class 'numpy.float64'> | <class 'numpy.float64'> | | <class 'numpy.complex256'> | <class 'numpy.float64'> | <class 'numpy.float64'> | | <class 'numpy.int8'> | <class 'numpy.float32'> | <class 'numpy.float64'> | | <class 'numpy.int16'> | <class 'numpy.float32'> | <class 'numpy.float64'> | | <class 'numpy.int32'> | <class 'numpy.float64'> | <class 'numpy.float64'> | | <class 'numpy.int64'> | <class 'numpy.float64'> | <class 'numpy.float64'> | | <class 'numpy.uint8'> | <class 'numpy.float32'> | <class 'numpy.float64'> | | <class 'numpy.uint16'> | <class 'numpy.float32'> | <class 'numpy.float64'> | | <class 'numpy.uint32'> | <class 'numpy.float64'> | <class 'numpy.float64'> | | <class 'numpy.uint64'> | <class 'numpy.float64'> | <class 'numpy.float64'> | | <class 'numpy.float16'> | <class 'numpy.float32'> | <class 'numpy.float32'> | | <class 'numpy.float32'> | <class 'numpy.float32'> | <class 'numpy.float32'> | | <class 'numpy.float64'> | <class 'numpy.float64'> | <class 'numpy.float64'> | |
{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
float32 instead of float64 when decoding int16 with scale_factor netcdf var using xarray 343659822 | |
1200266255 | https://github.com/pydata/xarray/issues/2304#issuecomment-1200266255 | https://api.github.com/repos/pydata/xarray/issues/2304 | IC_kwDOAMm_X85HipwP | mankoff 145117 | 2022-07-30T17:58:51Z | 2022-07-30T17:58:51Z | CONTRIBUTOR | This issue, based on its title and initial post, is fixed by PR #6851. The code to select dtype was already correct, but the outer function that called it had a bug in the call. Per the CF spec,
I find this is ambiguous. is The broader discussion here is about CF compliance. I find the spec ambiguous and xarray non-compliant. So many tests rely on the existing behavior, that I am unsure how best to proceed to improve compliance. I worry it may be a major refactor, and possibly break things relying on the existing behavior. I'd like to discuss architecture. Should this be in a new issue, if this closes with PR #6851? Should there be a new keyword for |
{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
float32 instead of float64 when decoding int16 with scale_factor netcdf var using xarray 343659822 | |
1188529343 | https://github.com/pydata/xarray/issues/2304#issuecomment-1188529343 | https://api.github.com/repos/pydata/xarray/issues/2304 | IC_kwDOAMm_X85G14S_ | mankoff 145117 | 2022-07-19T02:35:30Z | 2022-07-19T03:20:51Z | CONTRIBUTOR | I've run into this issue too, and the xarray decision to use The data value is 1395. The scale is 0.0001.
Because we are using |
{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
float32 instead of float64 when decoding int16 with scale_factor netcdf var using xarray 343659822 | |
410792506 | https://github.com/pydata/xarray/issues/2304#issuecomment-410792506 | https://api.github.com/repos/pydata/xarray/issues/2304 | MDEyOklzc3VlQ29tbWVudDQxMDc5MjUwNg== | Thomas-Z 1492047 | 2018-08-06T17:47:23Z | 2019-01-09T15:18:36Z | CONTRIBUTOR | To explain the full context and why it became some kind of a problem to us : We're experimenting with the parquet format (via pyarrow) and we first did something like : netcdf file -> netcdf4 -> pandas -> pyarrow -> pandas (when read later on). We're now looking at xarray and the huge ease of access it offers to netcdf like data and we tried something similar : netcdf file -> xarray -> pandas -> pyarrow -> pandas (when read later on). Our problem appears when we're reading and comparing the data stored with these 2 approches. The difference between the 2 was - sometimes - larger than what expected/acceptable (10e-6 for float32 if I'm not mistaken). We're not constraining any type and letting the system and modules decide how to encode what and in the end we have significantly different values. There might be something wrong in our process but it originate here with this float32/float64 choice so we thought it might be a problem. Thanks for taking the time to look into this. |
{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
float32 instead of float64 when decoding int16 with scale_factor netcdf var using xarray 343659822 | |
411385081 | https://github.com/pydata/xarray/issues/2304#issuecomment-411385081 | https://api.github.com/repos/pydata/xarray/issues/2304 | MDEyOklzc3VlQ29tbWVudDQxMTM4NTA4MQ== | Thomas-Z 1492047 | 2018-08-08T12:18:02Z | 2018-08-22T07:14:58Z | CONTRIBUTOR | So, a more complete example showing this problem. NetCDF file used in the example : test.nc.zip ````python from netCDF4 import Dataset import xarray as xr import numpy as np import pandas as pd d = Dataset("test.nc") v = d.variables['var'] print(v) <class 'netCDF4._netCDF4.Variable'>int16 var(idx)_FillValue: 32767scale_factor: 0.01unlimited dimensions:current shape = (2,)filling ondf_nc = pd.DataFrame(data={'var': v[:]}) print(df_nc) var0 21.941 27.04ds = xr.open_dataset("test.nc") df_xr = ds['var'].to_dataframe() Comparing both dataframes with float32 precision (1e-6)mask = np.isclose(df_nc['var'], df_xr['var'], rtol=0, atol=1e-6) print(mask) [False True]print(df_xr) varidx0 21.9399991 27.039999Changing the type and rounding the xarray dataframedf_xr2 = df_xr.astype(np.float64).round(int(np.ceil(-np.log10(ds['var'].encoding['scale_factor'])))) mask = np.isclose(df_nc['var'], df_xr2['var'], rtol=0, atol=1e-6) print(mask) [ True True]print(df_xr2) varidx0 21.941 27.04```` As you can see, the problem appears early in the process (not related to the way data are stored in parquet later on) and yes, rounding values does solve it. |
{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
float32 instead of float64 when decoding int16 with scale_factor netcdf var using xarray 343659822 | |
410782982 | https://github.com/pydata/xarray/issues/2304#issuecomment-410782982 | https://api.github.com/repos/pydata/xarray/issues/2304 | MDEyOklzc3VlQ29tbWVudDQxMDc4Mjk4Mg== | dopplershift 221526 | 2018-08-06T17:17:38Z | 2018-08-06T17:17:38Z | CONTRIBUTOR | Ah, ok, not scaling per-se (i.e. |
{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
float32 instead of float64 when decoding int16 with scale_factor netcdf var using xarray 343659822 | |
410779271 | https://github.com/pydata/xarray/issues/2304#issuecomment-410779271 | https://api.github.com/repos/pydata/xarray/issues/2304 | MDEyOklzc3VlQ29tbWVudDQxMDc3OTI3MQ== | dopplershift 221526 | 2018-08-06T17:06:22Z | 2018-08-06T17:06:22Z | CONTRIBUTOR | I'm not following why the data are scaled twice. Your point about the rounding being different is well-taken, though. |
{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
float32 instead of float64 when decoding int16 with scale_factor netcdf var using xarray 343659822 | |
410774955 | https://github.com/pydata/xarray/issues/2304#issuecomment-410774955 | https://api.github.com/repos/pydata/xarray/issues/2304 | MDEyOklzc3VlQ29tbWVudDQxMDc3NDk1NQ== | dopplershift 221526 | 2018-08-06T16:52:42Z | 2018-08-06T16:52:53Z | CONTRIBUTOR | @shoyer But since it's a downstream calculation issue, and does not impact the actual precision of what's being read from the file, what's wrong with saying "Use |
{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
float32 instead of float64 when decoding int16 with scale_factor netcdf var using xarray 343659822 | |
410769706 | https://github.com/pydata/xarray/issues/2304#issuecomment-410769706 | https://api.github.com/repos/pydata/xarray/issues/2304 | MDEyOklzc3VlQ29tbWVudDQxMDc2OTcwNg== | dopplershift 221526 | 2018-08-06T16:34:44Z | 2018-08-06T16:36:16Z | CONTRIBUTOR | A float32 values has 24 bits of precision in the significand, which is more than enough to store the 16-bits in in the original data; the exponent (8 bits) will more or less take care of the ```python
What you're seeing is an artifact of printing out the values. I have no idea why something is printing out a float (only 7 decimal digits) out to 17 digits. Even float64 only has 16 digits (which is overkill for this application). The difference in subtracting the 32- and 64-bit values above are in the 8th decimal place, which is beyond the actual precision of the data; what you've just demonstrated is the difference in precision between 32-bit and 64-bit values, but it had nothing to do whatsoever with the data. If you're really worried about precision round-off for things like std. dev, you should probably calculate it using the raw integer values and scale afterwards. (I don't actually think this is necessary, though.) |
{ "total_count": 1, "+1": 1, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
float32 instead of float64 when decoding int16 with scale_factor netcdf var using xarray 343659822 | |
410675562 | https://github.com/pydata/xarray/issues/2304#issuecomment-410675562 | https://api.github.com/repos/pydata/xarray/issues/2304 | MDEyOklzc3VlQ29tbWVudDQxMDY3NTU2Mg== | Thomas-Z 1492047 | 2018-08-06T11:19:30Z | 2018-08-06T11:19:30Z | CONTRIBUTOR | You're right when you say
You'll have a float64 in the end but you won't get your precision back and it might be a problem in some case. I understand the benefits of using float32 on the memory side but it is kind of a problem for us each time we have variables using scale factors. I'm surprised this issue (if considered as one) does not bother more people. |
{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
float32 instead of float64 when decoding int16 with scale_factor netcdf var using xarray 343659822 |
Advanced export
JSON shape: default, array, newline-delimited, object
CREATE TABLE [issue_comments] ( [html_url] TEXT, [issue_url] TEXT, [id] INTEGER PRIMARY KEY, [node_id] TEXT, [user] INTEGER REFERENCES [users]([id]), [created_at] TEXT, [updated_at] TEXT, [author_association] TEXT, [body] TEXT, [reactions] TEXT, [performed_via_github_app] TEXT, [issue] INTEGER REFERENCES [issues]([id]) ); CREATE INDEX [idx_issue_comments_issue] ON [issue_comments] ([issue]); CREATE INDEX [idx_issue_comments_user] ON [issue_comments] ([user]);
user 3