home / github / issue_comments

Menu
  • Search all tables
  • GraphQL API

issue_comments: 573455048

This data as json

html_url issue_url id node_id user created_at updated_at author_association body reactions performed_via_github_app issue
https://github.com/pydata/xarray/issues/3686#issuecomment-573455048 https://api.github.com/repos/pydata/xarray/issues/3686 573455048 MDEyOklzc3VlQ29tbWVudDU3MzQ1NTA0OA== 1197350 2020-01-12T20:41:53Z 2020-01-12T20:41:53Z MEMBER

Thanks for the useful issue @abarciauskas-bgse and valuable test @dmedv.

I believe this is fundamentally a Dask issue. In general, Dask's algorithms do not guarantee numerically identical results for different chunk sizes. Roundoff errors accrue slightly differently based on how the array is split up. These errors are usually acceptable to users. For example, 290.13754 vs 290.13757, the error is in the 8th significant digit, 1 part in 100,00,000. Since there are only 65,536 16-bit integers (the original data type in the netCDF file), this seems more than adequate precision to me.

Calling .mean() on a dask array is not the same as a checksum. As with all numerical calculations, equality should be verified with a precision appropriate to the data type and algorithm, e.g. using assert_allclose.

There appears to be a second issue here related to fill values, but I haven't quite grasped whether we think there is a bug.

I think it would be nice if it were possible to control the mask application in open_dataset separately from scale/offset.

There may be a reason why these operations are coupled. Would have to look more closely at the code to know for sure.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  548475127
Powered by Datasette · Queries took 0.878ms · About: xarray-datasette