home / github / issue_comments

Menu
  • Search all tables
  • GraphQL API

issue_comments: 573380688

This data as json

html_url issue_url id node_id user created_at updated_at author_association body reactions performed_via_github_app issue
https://github.com/pydata/xarray/issues/3686#issuecomment-573380688 https://api.github.com/repos/pydata/xarray/issues/3686 573380688 MDEyOklzc3VlQ29tbWVudDU3MzM4MDY4OA== 3922329 2020-01-12T04:18:43Z 2020-01-12T04:27:23Z NONE

Actually, that's true not just for open_mfdataset, but even for open_dataset with a single file. I've tried it with one of those files from PO.DAAC, and got similar results - slightly different values depending on the chunking strategy.

Just a guess, but I think the problem here is that the calculations are done in floating-point arithmetic (probably float32...), and you get accumulated precision errors depending on the number of chunks.

Internally in the NetCDF file analysed_sst values are stored as int16, with real scale and offset values, so the correct way to calculate the mean would be to do it in original int16, and then apply scale and offset to the result. Automatic scaling is on by default (i.e. it will replace original array values with new scaled values), but you can turn it off in open_dataset with the mask_and_scale=False option: http://xarray.pydata.org/en/stable/generated/xarray.open_dataset.html I tried doing this, and then I got identical results with chunked and unchunked versions. Can pass this option to open_mfdataset as well with **kwargs.

I'm basically just starting to use xarray myself, so please someone correct me if any of the above is wrong.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  548475127
Powered by Datasette · Queries took 0.543ms · About: xarray-datasette