github: issue_comments: 5 rows where issue = 343659822 and user = 145117 sorted by updated

5 rows where issue = 343659822 and user = 145117 sorted by updated_at descending

Search:

descending

id	html_url	issue_url	node_id	user	created_at	updated_at ▲	author_association	body	reactions	issue
1201464999	https://github.com/pydata/xarray/issues/2304#issuecomment-1201464999	https://api.github.com/repos/pydata/xarray/issues/2304	IC_kwDOAMm_X85HnOan	mankoff 145117	2022-08-01T16:56:01Z	2022-08-01T16:56:01Z	CONTRIBUTOR	Packing Qs If "the variable containing the packed data must be of type byte, short or int", how do we choose what size int? What to do if `scale_factor` and `add_offset` are not float or double? What if they are different types? I assume issue a warning and continue? Unpacking Qs Should the unpacked data just be `np.find_common_type([data, add_offset, scale_factor], [])`, or should we then bump the type up by 1 level (float16->32, 32->64, 64->128, etc.) to cover overflow?	{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	float32 instead of float64 when decoding int16 with scale_factor netcdf var using xarray 343659822
1201461626	https://github.com/pydata/xarray/issues/2304#issuecomment-1201461626	https://api.github.com/repos/pydata/xarray/issues/2304	IC_kwDOAMm_X85HnNl6	mankoff 145117	2022-08-01T16:52:47Z	2022-08-01T16:52:47Z	CONTRIBUTOR	From: https://cfconventions.org/Data/cf-conventions/cf-conventions-1.7/build/ch08.html This standard is more restrictive than the NUG with respect to the use of the scale<sub>factor</sub> and add<sub>offset</sub> attributes; ambiguities and precision problems related to data type conversions are resolved by these restrictions. If the scale<sub>factor</sub> and add<sub>offset</sub> attributes are of the same data type as the associated variable, the unpacked data is assumed to be of the same data type as the packed data. What if the result of the operation leads to overflow? However, if the scale<sub>factor</sub> and add<sub>offset</sub> attributes are of a different data type from the variable (containing the packed data) then the unpacked data should match the type of these attributes, which must both be of type float or both be of type double. What if they are not of the same type? Presumably, use the largest of the three types. Again, this may lead to loss of precision. what if packed data is type int64 and scale<sub>factor</sub> is type float16. Seems like the result should be float64, not float16. An additional restriction in this case is that the variable containing the packed data must be of type byte, short or int. What to do if packed data is type float or double? It is not advised to unpack an int into a float as there is a potential precision loss. I think this means double is advised? If so, this should be stated. Should be rephrased to advise what to do (if there is one or only a few choices) rather than what not to do, or at least include that if not replacing current wording.	{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	float32 instead of float64 when decoding int16 with scale_factor netcdf var using xarray 343659822
1200627783	https://github.com/pydata/xarray/issues/2304#issuecomment-1200627783	https://api.github.com/repos/pydata/xarray/issues/2304	IC_kwDOAMm_X85HkCBH	mankoff 145117	2022-08-01T02:49:28Z	2022-08-01T05:55:15Z	CONTRIBUTOR	Current algorithm python def _choose_float_dtype(dtype, has_offset): """Return a float dtype that can losslessly represent `dtype` values.""" # Keep float32 as-is. Upcast half-precision to single-precision, # because float16 is "intended for storage but not computation" if dtype.itemsize <= 4 and np.issubdtype(dtype, np.floating): return np.float32 # float32 can exactly represent all integers up to 24 bits if dtype.itemsize <= 2 and np.issubdtype(dtype, np.integer): # A scale factor is entirely safe (vanishing into the mantissa), # but a large integer offset could lead to loss of precision. # Sensitivity analysis can be tricky, so we just use a float64 # if there's any offset at all - better unoptimised than wrong! if not has_offset: return np.float32 # For all other types and circumstances, we just use float64. # (safe because eg. complex numbers are not supported in NetCDF) return np.float64 Due to calling bug, `has_offset` is always `None`, so this can be simplified to: `python def _choose_float_dtype(dtype) if dtype.itemsize <= 4 and np.issubdtype(dtype, np.floating): return np.float32 if dtype.itemsize <= 2 and np.issubdtype(dtype, np.integer): return np.float32 return np.float64` Here I call the function twice, once with `has_offset` `False`, then `True`. ```python import numpy as np def _choose_float_dtype(dtype, has_offset): if dtype.itemsize <= 4 and np.issubdtype(dtype, np.floating): return np.float32 if dtype.itemsize <= 2 and np.issubdtype(dtype, np.integer): if not has_offset: return np.float32 return np.float64 generic types for dtype in [np.byte, np.ubyte, np.short, np.ushort, np.intc, np.uintc, np.int_, np.uint, np.longlong, np.ulonglong, np.half, np.float16, np.single, np.double, np.longdouble, np.csingle, np.cdouble, np.clongdouble, np.int8, np.int16, np.int32, np.int64, np.uint8, np.uint16, np.uint32, np.uint64, np.float16, np.float32, np.float64]: print("\|", dtype, "\|", _choose_float_dtype(np.dtype(dtype), False), "\|", _choose_float_dtype(np.dtype(dtype), True), "\|") ``` \| Input \|----------------------------- \| <class 'numpy.int8'> \| <class 'numpy.uint8'> \| <class 'numpy.int16'> \| <class 'numpy.uint16'> \| <class 'numpy.int32'> \| <class 'numpy.uint32'> \| <class 'numpy.int64'> \| <class 'numpy.uint64'> \| <class 'numpy.longlong'> \| <class 'numpy.ulonglong'> \| <class 'numpy.float16'> \| <class 'numpy.float16'> \| <class 'numpy.float32'> \| <class 'numpy.float64'> \| <class 'numpy.float128'> \| <class 'numpy.complex64'> \| <class 'numpy.complex128'> \| <class 'numpy.complex256'> \| <class 'numpy.int8'> \| <class 'numpy.int16'> \| <class 'numpy.int32'> \| <class 'numpy.int64'> \| <class 'numpy.uint8'> \| <class 'numpy.uint16'> \| <class 'numpy.uint32'> \| <class 'numpy.uint64'> \| <class 'numpy.float16'> \| <class 'numpy.float32'> \| <class 'numpy.float64'> \| Output as called \| Output as written \| \|---------------------------\|--------------------------\| \| <class 'numpy.float32'> \| <class 'numpy.float64'> \| \| <class 'numpy.float32'> \| <class 'numpy.float64'> \| \| <class 'numpy.float32'> \| <class 'numpy.float64'> \| \| <class 'numpy.float32'> \| <class 'numpy.float64'> \| \| <class 'numpy.float64'> \| <class 'numpy.float64'> \| \| <class 'numpy.float64'> \| <class 'numpy.float64'> \| \| <class 'numpy.float64'> \| <class 'numpy.float64'> \| \| <class 'numpy.float64'> \| <class 'numpy.float64'> \| \| <class 'numpy.float64'> \| <class 'numpy.float64'> \| \| <class 'numpy.float64'> \| <class 'numpy.float64'> \| \| <class 'numpy.float32'> \| <class 'numpy.float32'> \| \| <class 'numpy.float32'> \| <class 'numpy.float32'> \| \| <class 'numpy.float32'> \| <class 'numpy.float32'> \| \| <class 'numpy.float64'> \| <class 'numpy.float64'> \| \| <class 'numpy.float64'> \| <class 'numpy.float64'> \| \| <class 'numpy.float64'> \| <class 'numpy.float64'> \| \| <class 'numpy.float64'> \| <class 'numpy.float64'> \| \| <class 'numpy.float64'> \| <class 'numpy.float64'> \| \| <class 'numpy.float32'> \| <class 'numpy.float64'> \| \| <class 'numpy.float32'> \| <class 'numpy.float64'> \| \| <class 'numpy.float64'> \| <class 'numpy.float64'> \| \| <class 'numpy.float64'> \| <class 'numpy.float64'> \| \| <class 'numpy.float32'> \| <class 'numpy.float64'> \| \| <class 'numpy.float32'> \| <class 'numpy.float64'> \| \| <class 'numpy.float64'> \| <class 'numpy.float64'> \| \| <class 'numpy.float64'> \| <class 'numpy.float64'> \| \| <class 'numpy.float32'> \| <class 'numpy.float32'> \| \| <class 'numpy.float32'> \| <class 'numpy.float32'> \| \| <class 'numpy.float64'> \| <class 'numpy.float64'> \|	{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	float32 instead of float64 when decoding int16 with scale_factor netcdf var using xarray 343659822
1200266255	https://github.com/pydata/xarray/issues/2304#issuecomment-1200266255	https://api.github.com/repos/pydata/xarray/issues/2304	IC_kwDOAMm_X85HipwP	mankoff 145117	2022-07-30T17:58:51Z	2022-07-30T17:58:51Z	CONTRIBUTOR	This issue, based on its title and initial post, is fixed by PR #6851. The code to select dtype was already correct, but the outer function that called it had a bug in the call. Per the CF spec, the unpacked data should match the type of these attributes, which must both be of type float or both be of type double. An additional restriction in this case is that the variable containing the packed data must be of type byte, short or int. It is not advised to unpack an int into a float as there is a potential precision loss. I find this is ambiguous. is `float` above referring to `float16` or `float32`? Is `double` referring to `float64`? If so, then they do recommend `float64`, as requested by the OP, because the test data is `short` and the `scale_factor` is `float64` (a.k.a `double`?) The broader discussion here is about CF compliance. I find the spec ambiguous and xarray non-compliant. So many tests rely on the existing behavior, that I am unsure how best to proceed to improve compliance. I worry it may be a major refactor, and possibly break things relying on the existing behavior. I'd like to discuss architecture. Should this be in a new issue, if this closes with PR #6851? Should there be a new keyword for `cf_strict` or something?	{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	float32 instead of float64 when decoding int16 with scale_factor netcdf var using xarray 343659822
1188529343	https://github.com/pydata/xarray/issues/2304#issuecomment-1188529343	https://api.github.com/repos/pydata/xarray/issues/2304	IC_kwDOAMm_X85G14S_	mankoff 145117	2022-07-19T02:35:30Z	2022-07-19T03:20:51Z	CONTRIBUTOR	I've run into this issue too, and the xarray decision to use `float32` is causing problems. I recognize this is a generic floating-point representation issue, but it could be avoided with `float64`. The data value is 1395. The scale is 0.0001. `python val = int(1395) scale = 0.0001 print(valscale) # 0.1395 print( val np.array(scale).astype(float) ) # 0.1395 print( val * np.array(scale).astype(np.float16) ) # 0.1395213... print( val * np.array(scale).astype(np.float32) ) # 0.13949999... print( val * np.array(scale).astype(np.float64) ) # 0.1395` Because we are using `1E3 round()`, the difference between 0.1395 and 0.1394999 (or 139.5 and 139.49) ends up being quite large in the downstream product.	{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	float32 instead of float64 when decoding int16 with scale_factor netcdf var using xarray 343659822

Advanced export

JSON shape: default, array, newline-delimited, object

CREATE TABLE [issue_comments] (
   [html_url] TEXT,
   [issue_url] TEXT,
   [id] INTEGER PRIMARY KEY,
   [node_id] TEXT,
   [user] INTEGER REFERENCES [users]([id]),
   [created_at] TEXT,
   [updated_at] TEXT,
   [author_association] TEXT,
   [body] TEXT,
   [reactions] TEXT,
   [performed_via_github_app] TEXT,
   [issue] INTEGER REFERENCES [issues]([id])
);
CREATE INDEX [idx_issue_comments_issue]
    ON [issue_comments] ([issue]);
CREATE INDEX [idx_issue_comments_user]
    ON [issue_comments] ([user]);

issue_comments

5 rows where issue = 343659822 and user = 145117 sorted by updated_at descending

Packing Qs

Unpacking Qs

Current algorithm

generic types

Advanced export