issue_comments
7 rows where issue = 548475127 sorted by updated_at descending
This data as json, CSV (advanced)
Suggested facets: reactions, created_at (date), updated_at (date)
issue 1
- Different data values from xarray open_mfdataset when using chunks · 7 ✖
id | html_url | issue_url | node_id | user | created_at | updated_at ▲ | author_association | body | reactions | performed_via_github_app | issue |
---|---|---|---|---|---|---|---|---|---|---|---|
576422784 | https://github.com/pydata/xarray/issues/3686#issuecomment-576422784 | https://api.github.com/repos/pydata/xarray/issues/3686 | MDEyOklzc3VlQ29tbWVudDU3NjQyMjc4NA== | abarciauskas-bgse 15016780 | 2020-01-20T20:35:47Z | 2020-01-20T20:35:47Z | NONE | Closing as using |
{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
Different data values from xarray open_mfdataset when using chunks 548475127 | |
573458081 | https://github.com/pydata/xarray/issues/3686#issuecomment-573458081 | https://api.github.com/repos/pydata/xarray/issues/3686 | MDEyOklzc3VlQ29tbWVudDU3MzQ1ODA4MQ== | abarciauskas-bgse 15016780 | 2020-01-12T21:17:11Z | 2020-01-12T21:17:11Z | NONE | Thanks @rabernat I would like to use assert_allclose to test the output but at first pass it seems that might be prohibitively slow to test for large datasets, do you recommend sampling or other good testing strategies (e.g. to assert the xarray datasets are equal to some precision) |
{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
Different data values from xarray open_mfdataset when using chunks 548475127 | |
573455625 | https://github.com/pydata/xarray/issues/3686#issuecomment-573455625 | https://api.github.com/repos/pydata/xarray/issues/3686 | MDEyOklzc3VlQ29tbWVudDU3MzQ1NTYyNQ== | dmedv 3922329 | 2020-01-12T20:48:20Z | 2020-01-12T20:51:01Z | NONE | Actually, there is no need to separate them. One can simply do something like this to apply the mask:
|
{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
Different data values from xarray open_mfdataset when using chunks 548475127 | |
573455048 | https://github.com/pydata/xarray/issues/3686#issuecomment-573455048 | https://api.github.com/repos/pydata/xarray/issues/3686 | MDEyOklzc3VlQ29tbWVudDU3MzQ1NTA0OA== | rabernat 1197350 | 2020-01-12T20:41:53Z | 2020-01-12T20:41:53Z | MEMBER | Thanks for the useful issue @abarciauskas-bgse and valuable test @dmedv. I believe this is fundamentally a Dask issue. In general, Dask's algorithms do not guarantee numerically identical results for different chunk sizes. Roundoff errors accrue slightly differently based on how the array is split up. These errors are usually acceptable to users. For example, 290.13754 vs 290.13757, the error is in the 8th significant digit, 1 part in 100,00,000. Since there are only 65,536 16-bit integers (the original data type in the netCDF file), this seems more than adequate precision to me. Calling There appears to be a second issue here related to fill values, but I haven't quite grasped whether we think there is a bug.
There may be a reason why these operations are coupled. Would have to look more closely at the code to know for sure. |
{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
Different data values from xarray open_mfdataset when using chunks 548475127 | |
573451230 | https://github.com/pydata/xarray/issues/3686#issuecomment-573451230 | https://api.github.com/repos/pydata/xarray/issues/3686 | MDEyOklzc3VlQ29tbWVudDU3MzQ1MTIzMA== | dmedv 3922329 | 2020-01-12T19:59:31Z | 2020-01-12T20:25:16Z | NONE | @abarciauskas-bgse Yes, indeed, I forgot about |
{ "total_count": 1, "+1": 1, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
Different data values from xarray open_mfdataset when using chunks 548475127 | |
573444233 | https://github.com/pydata/xarray/issues/3686#issuecomment-573444233 | https://api.github.com/repos/pydata/xarray/issues/3686 | MDEyOklzc3VlQ29tbWVudDU3MzQ0NDIzMw== | abarciauskas-bgse 15016780 | 2020-01-12T18:37:59Z | 2020-01-12T18:37:59Z | NONE | @dmedv Thanks for this, it all makes sense to me and I see the same results, however I wasn't able to "convert back" using d = Dataset(fileObjs[0]) v = d.variables['analysed_sst'] print("Result with mask_and_scale=True") ds_unchunked = xr.open_dataset(fileObjs[0]) print(ds_unchunked.analysed_sst.sel(lat=slice(20,50),lon=slice(-170,-110)).mean().values) print("Result with mask_and_scale=False")
ds_unchunked = xr.open_dataset(fileObjs[0], mask_and_scale=False)
scaled = ds_unchunked.analysed_sst * v.scale_factor + v.add_offset
scaled.sel(lat=slice(20,50),lon=slice(-170,-110)).mean().values
However this led me to another seemingly related issue: https://github.com/pydata/xarray/issues/2304 Loss of precision seems to be the key here, so coercing the ``` print("results from unchunked dataset") ds_unchunked = xr.open_mfdataset(fileObjs, combine='by_coords') ds_unchunked['analysed_sst'] = ds_unchunked['analysed_sst'].astype(np.float64) print(ds_unchunked.analysed_sst[1,:,:].sel(lat=slice(20,50),lon=slice(-170,-110)).mean().values) print(f"results from chunked dataset using {chunks}") ds_chunked = xr.open_mfdataset(fileObjs, chunks=chunks, combine='by_coords') ds_chunked['analysed_sst'] = ds_chunked['analysed_sst'].astype(np.float64) print(ds_chunked.analysed_sst[1,:,:].sel(lat=slice(20,50),lon=slice(-170,-110)).mean().values) print("results from chunked dataset using 'auto'") ds_chunked = xr.open_mfdataset(fileObjs, chunks={'time': 'auto', 'lat': 'auto', 'lon': 'auto'}, combine='by_coords') ds_chunked['analysed_sst'] = ds_chunked['analysed_sst'].astype(np.float64) print(ds_chunked.analysed_sst[1,:,:].sel(lat=slice(20,50),lon=slice(-170,-110)).mean().values) ``` returns:
|
{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
Different data values from xarray open_mfdataset when using chunks 548475127 | |
573380688 | https://github.com/pydata/xarray/issues/3686#issuecomment-573380688 | https://api.github.com/repos/pydata/xarray/issues/3686 | MDEyOklzc3VlQ29tbWVudDU3MzM4MDY4OA== | dmedv 3922329 | 2020-01-12T04:18:43Z | 2020-01-12T04:27:23Z | NONE | Actually, that's true not just for Just a guess, but I think the problem here is that the calculations are done in floating-point arithmetic (probably float32...), and you get accumulated precision errors depending on the number of chunks. Internally in the NetCDF file I'm basically just starting to use xarray myself, so please someone correct me if any of the above is wrong. |
{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
Different data values from xarray open_mfdataset when using chunks 548475127 |
Advanced export
JSON shape: default, array, newline-delimited, object
CREATE TABLE [issue_comments] ( [html_url] TEXT, [issue_url] TEXT, [id] INTEGER PRIMARY KEY, [node_id] TEXT, [user] INTEGER REFERENCES [users]([id]), [created_at] TEXT, [updated_at] TEXT, [author_association] TEXT, [body] TEXT, [reactions] TEXT, [performed_via_github_app] TEXT, [issue] INTEGER REFERENCES [issues]([id]) ); CREATE INDEX [idx_issue_comments_issue] ON [issue_comments] ([issue]); CREATE INDEX [idx_issue_comments_user] ON [issue_comments] ([user]);
user 3