html_url,issue_url,id,node_id,user,created_at,updated_at,author_association,body,reactions,performed_via_github_app,issue
https://github.com/pydata/xarray/issues/3686#issuecomment-576422784,https://api.github.com/repos/pydata/xarray/issues/3686,576422784,MDEyOklzc3VlQ29tbWVudDU3NjQyMjc4NA==,15016780,2020-01-20T20:35:47Z,2020-01-20T20:35:47Z,NONE,Closing as using `mask_and_scale=False` produced precise results,"{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,548475127
https://github.com/pydata/xarray/issues/3686#issuecomment-573458081,https://api.github.com/repos/pydata/xarray/issues/3686,573458081,MDEyOklzc3VlQ29tbWVudDU3MzQ1ODA4MQ==,15016780,2020-01-12T21:17:11Z,2020-01-12T21:17:11Z,NONE,"Thanks @rabernat I would like to use [assert_allclose](http://xarray.pydata.org/en/stable/generated/xarray.testing.assert_allclose.html) to test the output but at first pass it seems that might be prohibitively slow to test for large datasets, do you recommend sampling or other good testing strategies (e.g. to assert the xarray datasets are equal to some precision)","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,548475127
https://github.com/pydata/xarray/issues/3686#issuecomment-573455625,https://api.github.com/repos/pydata/xarray/issues/3686,573455625,MDEyOklzc3VlQ29tbWVudDU3MzQ1NTYyNQ==,3922329,2020-01-12T20:48:20Z,2020-01-12T20:51:01Z,NONE,"Actually, there is no need to separate them. One can simply do something like this to apply the mask:
```
ds.analysed_sst.where(ds.analysed_sst != fill_value).mean() * scale_factor + offset
```
It's not a bug, but if we set `mask_and_scale=False`, it's left up to us to apply the mask manually.","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,548475127
https://github.com/pydata/xarray/issues/3686#issuecomment-573451230,https://api.github.com/repos/pydata/xarray/issues/3686,573451230,MDEyOklzc3VlQ29tbWVudDU3MzQ1MTIzMA==,3922329,2020-01-12T19:59:31Z,2020-01-12T20:25:16Z,NONE,"@abarciauskas-bgse Yes, indeed, I forgot about `_FillValue`. That would mess up the mean calculation with `mask_and_scale=False`. I think it would be nice if it were possible to control the mask application in `open_dataset` separately from scale/offset. ","{""total_count"": 1, ""+1"": 1, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,548475127
https://github.com/pydata/xarray/issues/3686#issuecomment-573444233,https://api.github.com/repos/pydata/xarray/issues/3686,573444233,MDEyOklzc3VlQ29tbWVudDU3MzQ0NDIzMw==,15016780,2020-01-12T18:37:59Z,2020-01-12T18:37:59Z,NONE,"@dmedv Thanks for this, it all makes sense to me and I see the same results, however I wasn't able to ""convert back"" using `scale_factor` and `add_offset`
```
from netCDF4 import Dataset
d = Dataset(fileObjs[0])
v = d.variables['analysed_sst']
print(""Result with mask_and_scale=True"")
ds_unchunked = xr.open_dataset(fileObjs[0])
print(ds_unchunked.analysed_sst.sel(lat=slice(20,50),lon=slice(-170,-110)).mean().values)
print(""Result with mask_and_scale=False"")
ds_unchunked = xr.open_dataset(fileObjs[0], mask_and_scale=False)
scaled = ds_unchunked.analysed_sst * v.scale_factor + v.add_offset
scaled.sel(lat=slice(20,50),lon=slice(-170,-110)).mean().values
```
^^ That returns a different result than what I expect. I wonder if this is because of the `_FillValue` missing from trying to convert back.
_However_ this led me to another seemingly related issue: https://github.com/pydata/xarray/issues/2304
Loss of precision seems to be the key here, so coercing the `float32`s to `float64`s appears to get the same results from both chunked and unchunked versions - but still not
```
print(""results from unchunked dataset"")
ds_unchunked = xr.open_mfdataset(fileObjs, combine='by_coords')
ds_unchunked['analysed_sst'] = ds_unchunked['analysed_sst'].astype(np.float64)
print(ds_unchunked.analysed_sst[1,:,:].sel(lat=slice(20,50),lon=slice(-170,-110)).mean().values)
print(f""results from chunked dataset using {chunks}"")
ds_chunked = xr.open_mfdataset(fileObjs, chunks=chunks, combine='by_coords')
ds_chunked['analysed_sst'] = ds_chunked['analysed_sst'].astype(np.float64)
print(ds_chunked.analysed_sst[1,:,:].sel(lat=slice(20,50),lon=slice(-170,-110)).mean().values)
print(""results from chunked dataset using 'auto'"")
ds_chunked = xr.open_mfdataset(fileObjs, chunks={'time': 'auto', 'lat': 'auto', 'lon': 'auto'}, combine='by_coords')
ds_chunked['analysed_sst'] = ds_chunked['analysed_sst'].astype(np.float64)
print(ds_chunked.analysed_sst[1,:,:].sel(lat=slice(20,50),lon=slice(-170,-110)).mean().values)
```
returns:
```
results from unchunked dataset
290.1375818862207
results from chunked dataset using {'time': 1, 'lat': 1799, 'lon': 3600}
290.1375818862207
results from chunked dataset using 'auto'
290.1375818862207
```","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,548475127
https://github.com/pydata/xarray/issues/3686#issuecomment-573380688,https://api.github.com/repos/pydata/xarray/issues/3686,573380688,MDEyOklzc3VlQ29tbWVudDU3MzM4MDY4OA==,3922329,2020-01-12T04:18:43Z,2020-01-12T04:27:23Z,NONE,"Actually, that's true not just for `open_mfdataset`, but even for `open_dataset` with a single file. I've tried it with one of those files from PO.DAAC, and got similar results - slightly different values depending on the chunking strategy.
Just a guess, but I think the problem here is that the calculations are done in floating-point arithmetic (probably float32...), and you get accumulated precision errors depending on the number of chunks.
Internally in the NetCDF file `analysed_sst` values are stored as int16, with real scale and offset values, so the correct way to calculate the mean would be to do it in original int16, and then apply scale and offset to the result. Automatic scaling is on by default (i.e. it will replace original array values with new scaled values), but you can turn it off in `open_dataset` with the `mask_and_scale=False` option: http://xarray.pydata.org/en/stable/generated/xarray.open_dataset.html I tried doing this, and then I got identical results with chunked and unchunked versions. Can pass this option to `open_mfdataset` as well with `**kwargs`.
I'm basically just starting to use xarray myself, so please someone correct me if any of the above is wrong.","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,548475127