home / github / issue_comments

Menu
  • GraphQL API
  • Search all tables

issue_comments: 1248302788

This data as json

html_url issue_url id node_id user created_at updated_at author_association body reactions performed_via_github_app issue
https://github.com/pydata/xarray/issues/7039#issuecomment-1248302788 https://api.github.com/repos/pydata/xarray/issues/7039 1248302788 IC_kwDOAMm_X85KZ5bE 1197350 2022-09-15T16:02:17Z 2022-09-15T16:02:17Z MEMBER

I am curious as to what exactly from the encoding introduces the noise (I still need to read through the documentation more thoroughly)?

The encoding says that your data should be encoded according to the following pseudocode formula: encoded = int((original - offset) / scale_factor) decoded = (scale_factor * float(encoded)) + offset

So the floating-point data are converted back and forth to a less precise type (integer) in order to save space. These numerical operations cannot preserve exact floating point accuracy. That's just how numerical float-point operations work. If you skip the encoding, then you just write the floating point bytes directly to disk, with no loss of precision.

This sort of encoding a crude form of lossy compression that is still unfortunately in use, even though there are much better algorithms available (and built into netcdf and zarr). Differences on the order of 10^-14 should not affect any real-world calculations.

However, this seems like a much, much smaller difference than the problem you originally reported. This suggests that the MRE does not actually reproduce the bug after all. How was the plot above (https://github.com/pydata/xarray/issues/7039#issue-1373352524) generated? From your actual MRE code? Or from your earlier example with real data?

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  1373352524
Powered by Datasette · Queries took 158.795ms · About: xarray-datasette