github: issue_comments: 11 rows where author_association = "MEMBER" and issue = 343659822 sorted by updated

11 rows where author_association = "MEMBER" and issue = 343659822 sorted by updated_at descending

Search:

descending

id	html_url	issue_url	node_id	user	created_at	updated_at ▲	author_association	body	reactions	issue
1200314984	https://github.com/pydata/xarray/issues/2304#issuecomment-1200314984	https://api.github.com/repos/pydata/xarray/issues/2304	IC_kwDOAMm_X85Hi1po	shoyer 1217238	2022-07-30T23:55:04Z	2022-07-30T23:55:04Z	MEMBER	the unpacked data should match the type of these attributes, which must both be of type float or both be of type double. An additional restriction in this case is that the variable containing the packed data must be of type byte, short or int. It is not advised to unpack an int into a float as there is a potential precision loss. I find this is ambiguous. is `float` above referring to `float16` or `float32`? Is `double` referring to `float64`? Yes, I'm pretty sure "float" means single precision (np.float32), given that "double" certainly means double precision (no.float64). If so, then they do recommend `float64`, as requested by the OP, because the test data is `short` and the `scale_factor` is `float64` (a.k.a `double`?) Yes, I believe so. The broader discussion here is about CF compliance. I find the spec ambiguous and xarray non-compliant. So many tests rely on the existing behavior, that I am unsure how best to proceed to improve compliance. I worry it may be a major refactor, and possibly break things relying on the existing behavior. I'd like to discuss architecture. Should this be in a new issue, if this closes with PR #6851? Should there be a new keyword for `cf_strict` or something? I think we can treat this a bug fix and just go forward with it. Yes, some people are going to be surprised, but I don't think it's distruptive enough that we need to go to a major effort to preserve backwards compatibility. It should already be straightforward to work around by setting `decode_cf=False` when opening a file and then explicitly calling `xarray.decode_cf()`.	{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	float32 instead of float64 when decoding int16 with scale_factor netcdf var using xarray 343659822
1189151229	https://github.com/pydata/xarray/issues/2304#issuecomment-1189151229	https://api.github.com/repos/pydata/xarray/issues/2304	IC_kwDOAMm_X85G4QH9	dcherian 2448579	2022-07-19T14:49:34Z	2022-07-19T14:49:34Z	MEMBER	We'd happily take a PR implementing the suggestion above following CF-conventions. Looking at the dtype for `add_offset` and `scale_factor` does seem like a much cleaner way to handle this issue. I think we should give that a try! IIUC the change should be made here in `_choose_float_dtype`: https://github.com/pydata/xarray/blob/392a61484e80e6ccfd5774b68be51578077d4292/xarray/coding/variables.py#L266-L283	{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	float32 instead of float64 when decoding int16 with scale_factor netcdf var using xarray 343659822
462614107	https://github.com/pydata/xarray/issues/2304#issuecomment-462614107	https://api.github.com/repos/pydata/xarray/issues/2304	MDEyOklzc3VlQ29tbWVudDQ2MjYxNDEwNw==	shoyer 1217238	2019-02-12T04:46:18Z	2019-02-12T04:46:47Z	MEMBER	@magau thanks for pointing this out -- I think we simplify missed this part of the CF conventions document! Looking at the dtype for `add_offset` and `scale_factor` does seem like a much cleaner way to handle this issue. I think we should give that a try! We will still need some fall-back choice for `CFMaskCoder` if neither a `add_offset` or `scale_factor` attribute is provided (due to xarray's representation of missing values as NaN), but this is a relatively uncommon situation.	{ "total_count": 1, "+1": 1, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	float32 instead of float64 when decoding int16 with scale_factor netcdf var using xarray 343659822
412495621	https://github.com/pydata/xarray/issues/2304#issuecomment-412495621	https://api.github.com/repos/pydata/xarray/issues/2304	MDEyOklzc3VlQ29tbWVudDQxMjQ5NTYyMQ==	fmaussion 10050469	2018-08-13T12:04:10Z	2018-08-13T12:04:10Z	MEMBER	I think we are still talking about different things. In the example by @Thomas-Z above there is still a problem at the line: ```python Comparing both dataframes with float32 precision (1e-6) mask = np.isclose(df_nc['var'], df_xr['var'], rtol=0, atol=1e-6) ``` As discussed several times above, this test is misleading: it should assert for `atol=0.01`, which is the real accuracy of the underlying data. For this purpose float32 is more than good enough. @shoyer said: I would be happy to add options for whether to default to float32 or float64 precision. so we would welcome a PR in this direction! I don't think we need to change the default behavior though, as there is a slight possibility that some people are relying on the data being float32.	{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	float32 instead of float64 when decoding int16 with scale_factor netcdf var using xarray 343659822
410807622	https://github.com/pydata/xarray/issues/2304#issuecomment-410807622	https://api.github.com/repos/pydata/xarray/issues/2304	MDEyOklzc3VlQ29tbWVudDQxMDgwNzYyMg==	shoyer 1217238	2018-08-06T18:33:06Z	2018-08-06T18:33:06Z	MEMBER	Please let us know if converting to float64 explicitly and rounding again does not solve this issue for you. On Mon, Aug 6, 2018 at 10:47 AM Thomas Zilio notifications@github.com wrote: To explain the full context and why it became some kind of a problem to us : We're experimenting with the parquet format (via pyarrow) and we first did something like : netcdf file -> netcdf4 -> pandas -> pyarrow -> pandas (when read later on). We're now looking at xarray and the the huge ease of access it offers to netcdf like data and we tried something similar : netcdf file -> xarray -> pandas -> pyarrow -> pandas (when read later on). Our problem appears when we're reading and comparing the data stored with these 2 approches. The difference between the 2 was - sometimes - larger than what expected/acceptable (10e-6 for float32 if I'm not mistaken). We're not constraining any type and letting the system and modules decide how to encode what and in the end we have significantly different values. There might be something wrong in our process but it originate here with this float32/float64 choice so we thought it might be a problem. Thanks for taking the time to look into this. — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/pydata/xarray/issues/2304#issuecomment-410792506, or mute the thread https://github.com/notifications/unsubscribe-auth/ABKS1iZHdJnGlkA_dHGHFonA27lIM2xHks5uOIErgaJpZM4VbG9w .	{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	float32 instead of float64 when decoding int16 with scale_factor netcdf var using xarray 343659822
410787443	https://github.com/pydata/xarray/issues/2304#issuecomment-410787443	https://api.github.com/repos/pydata/xarray/issues/2304	MDEyOklzc3VlQ29tbWVudDQxMDc4NzQ0Mw==	shoyer 1217238	2018-08-06T17:31:22Z	2018-08-06T17:31:22Z	MEMBER	Both multiplying by 0.01 and float32 -> float64 are approximately equivalently expensive. The cost is dominated by the memory copy. On Mon, Aug 6, 2018 at 10:17 AM Ryan May notifications@github.com wrote: Ah, ok, not scaling per-se (i.e. * 0.01), but a second round of value conversion. — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/pydata/xarray/issues/2304#issuecomment-410782982, or mute the thread https://github.com/notifications/unsubscribe-auth/ABKS1oEOX3WI7oaPDOQb7R59UgDyPXDsks5uOHozgaJpZM4VbG9w .	{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	float32 instead of float64 when decoding int16 with scale_factor netcdf var using xarray 343659822
410781556	https://github.com/pydata/xarray/issues/2304#issuecomment-410781556	https://api.github.com/repos/pydata/xarray/issues/2304	MDEyOklzc3VlQ29tbWVudDQxMDc4MTU1Ng==	shoyer 1217238	2018-08-06T17:13:27Z	2018-08-06T17:13:27Z	MEMBER	I'm not following why the data are scaled twice. We automatically scale the data from int16->float32 upon reading it in xarray (if decode_cf=True). There's no way to turn that off and still get automatic scaling, so the best you can do is layer on int16->float32->float64, when you might prefer to only do int16->float64.	{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	float32 instead of float64 when decoding int16 with scale_factor netcdf var using xarray 343659822
410777201	https://github.com/pydata/xarray/issues/2304#issuecomment-410777201	https://api.github.com/repos/pydata/xarray/issues/2304	MDEyOklzc3VlQ29tbWVudDQxMDc3NzIwMQ==	shoyer 1217238	2018-08-06T17:00:01Z	2018-08-06T17:00:01Z	MEMBER	But since it's a downstream calculation issue, and does not impact the actual precision of what's being read from the file, what's wrong with saying "Use data.astype(np.float64)". It's completely identical to doing it internally to xarray. It's almost but not quite identical. The difference is that the data gets scaled twice. This adds twice the overhead for scaling the values (which to be fair is usually negligible compared to IO). Also, to get exactly equivalent numerics for further computation you would need to round again, e.g., `data.astype(np.float64).round(np.ceil(-np.log10(data.encoding['scale_factor'])))`. This starts to get a little messy :).	{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	float32 instead of float64 when decoding int16 with scale_factor netcdf var using xarray 343659822
410773312	https://github.com/pydata/xarray/issues/2304#issuecomment-410773312	https://api.github.com/repos/pydata/xarray/issues/2304	MDEyOklzc3VlQ29tbWVudDQxMDc3MzMxMg==	shoyer 1217238	2018-08-06T16:47:11Z	2018-08-06T16:47:22Z	MEMBER	A float32 values has 24 bits of precision in the significand, which is more than enough to store the 16-bits in in the original data; the exponent (8 bits) will more or less take care of the * 0.01: Right. The actual raw data is being stored as an integer `21940` (along with the scale factor of `0.01`). Both `21.939998626708984` (as float32) and `21.940000000000001` (as float64) are floating point approximations of the exact decimal number `219.40`. I would be happy to add options for whether to default to float32 or float64 precision. There are clearly tradeoffs here: - float32 uses half the memory - float64 has more precision for downstream computation I don't think we can make a statement about which is better in general. The best we can do is make an educated guess about which will be more useful / less surprising for most and/or new users, and pick that as the default.	{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	float32 instead of float64 when decoding int16 with scale_factor netcdf var using xarray 343659822
410680371	https://github.com/pydata/xarray/issues/2304#issuecomment-410680371	https://api.github.com/repos/pydata/xarray/issues/2304	MDEyOklzc3VlQ29tbWVudDQxMDY4MDM3MQ==	fmaussion 10050469	2018-08-06T11:41:38Z	2018-08-06T11:41:38Z	MEMBER	As mentioned in the original issue the modification is straightforward. Any ideas if this could be integrated to xarray anytime soon ? Some people might prefer float32, so it is not as straightforward as it seems. It might be possible to add an option for this, but I didn't look into the details. You'll have a float64 in the end but you won't get your precision back Note that this is a fake sense of precision, because in the example above the compression used is lossy, i.e. precision was lost at compression and the actual precision is now 0.01: `short agc_40hz(time, meas_ind) ; agc_40hz:_FillValue = 32767s ; agc_40hz:units = "dB" ; agc_40hz:scale_factor = 0.01 ;`	{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	float32 instead of float64 when decoding int16 with scale_factor netcdf var using xarray 343659822
407087615	https://github.com/pydata/xarray/issues/2304#issuecomment-407087615	https://api.github.com/repos/pydata/xarray/issues/2304	MDEyOklzc3VlQ29tbWVudDQwNzA4NzYxNQ==	shoyer 1217238	2018-07-23T14:57:20Z	2018-07-23T14:57:20Z	MEMBER	To clarify: why is it a problem for you to get floating point values like 21.939998626708984 instead of 21.940000000000001? Is it a loss of precision in some downstream calculation? Both numbers are accurate well within the precision indicated by the netCDF file (0.01). Note that it's very easy to later convert from float32 to float64, e.g., by writing `ds.astype(np.float64)`.	{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	float32 instead of float64 when decoding int16 with scale_factor netcdf var using xarray 343659822

Advanced export

JSON shape: default, array, newline-delimited, object

CREATE TABLE [issue_comments] (
   [html_url] TEXT,
   [issue_url] TEXT,
   [id] INTEGER PRIMARY KEY,
   [node_id] TEXT,
   [user] INTEGER REFERENCES [users]([id]),
   [created_at] TEXT,
   [updated_at] TEXT,
   [author_association] TEXT,
   [body] TEXT,
   [reactions] TEXT,
   [performed_via_github_app] TEXT,
   [issue] INTEGER REFERENCES [issues]([id])
);
CREATE INDEX [idx_issue_comments_issue]
    ON [issue_comments] ([issue]);
CREATE INDEX [idx_issue_comments_user]
    ON [issue_comments] ([user]);

issue_comments

11 rows where author_association = "MEMBER" and issue = 343659822 sorted by updated_at descending

Comparing both dataframes with float32 precision (1e-6)

Advanced export