github: issue_comments: 14 rows where issue = 1373352524 sorted by updated

14 rows where issue = 1373352524 sorted by updated_at descending

Search:

descending

id	html_url	issue_url	node_id	user	created_at	updated_at ▲	author_association	body	reactions	issue
1460185069	https://github.com/pydata/xarray/issues/7039#issuecomment-1460185069	https://api.github.com/repos/pydata/xarray/issues/7039	IC_kwDOAMm_X85XCKft	rabernat 1197350	2023-03-08T13:51:06Z	2023-03-08T13:51:06Z	MEMBER	Rather than using the scale_factor and add_offset approach, I would look into xbitinfo if you want to optimize your compression.	{ "total_count": 2, "+1": 2, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	Encoding error when saving netcdf 1373352524
1460166694	https://github.com/pydata/xarray/issues/7039#issuecomment-1460166694	https://api.github.com/repos/pydata/xarray/issues/7039	IC_kwDOAMm_X85XCGAm	etsmith14 35741277	2023-03-08T13:37:10Z	2023-03-08T13:37:47Z	NONE	Thanks for that note. I have a bunch of variables, like precipitation type, where that would be totally fine. Definitely looking to save on disk space, so may try to recompute the scale_factor and add_offset on other variables as suggested.	{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	Encoding error when saving netcdf 1373352524
1460160756	https://github.com/pydata/xarray/issues/7039#issuecomment-1460160756	https://api.github.com/repos/pydata/xarray/issues/7039	IC_kwDOAMm_X85XCEj0	veenstrajelmer 60435591	2023-03-08T13:32:02Z	2023-03-08T13:32:02Z	CONTRIBUTOR	Hi @etsmith14. The suggestion I did loses accuracy and depending on the variable this is not acceptable. However, recomputing `scale_factor` and `add_offset` is possible: https://github.com/ArcticSnow/TopoPyScale/issues/60#issuecomment-1460022033 It is more complicated than dropping the `dtype`, but it does keep the filesize small.	{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	Encoding error when saving netcdf 1373352524
1460116369	https://github.com/pydata/xarray/issues/7039#issuecomment-1460116369	https://api.github.com/repos/pydata/xarray/issues/7039	IC_kwDOAMm_X85XB5uR	etsmith14 35741277	2023-03-08T13:00:21Z	2023-03-08T13:00:21Z	NONE	Thanks for the alternative @veenstrajelmer. I'll give it a try on my end.	{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	Encoding error when saving netcdf 1373352524
1458582799	https://github.com/pydata/xarray/issues/7039#issuecomment-1458582799	https://api.github.com/repos/pydata/xarray/issues/7039	IC_kwDOAMm_X85W8DUP	veenstrajelmer 60435591	2023-03-07T17:47:20Z	2023-03-07T22:48:36Z	CONTRIBUTOR	@etsmith14: another workaround is removing the `scale_factor` instead of the `dtype`. This keeps the file size small. However, there are slight offsets between the source and destination datasets, which is to be expected since the original value for the msl variable was in the range of 0.1/0.11 and removing it defaults to 1. For your variable, the scale_factor might also be completely different. However, maybe the `scale_factor` (and `add_offset`) can be replaced by something that works for all ERA5 data instead of a value very specific to a single dataset/period.	{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	Encoding error when saving netcdf 1373352524
1438244290	https://github.com/pydata/xarray/issues/7039#issuecomment-1438244290	https://api.github.com/repos/pydata/xarray/issues/7039	IC_kwDOAMm_X85Vud3C	veenstrajelmer 60435591	2023-02-21T10:36:48Z	2023-02-21T10:50:20Z	CONTRIBUTOR	I have been thinking about a desireable solution, but I have a bit of trouble with it. Besides removing dtype from encoding (resulting in floats being written), one could also change the scale_factor to a higher value (e.g. 0.5). Writing this to int does take half the disksize than releasing the int restriction and writing it to float32. Whatever you do, the data is altered at least slightly. Apparently, the data cannot be properly written to integers after reading it. This is a bit odd I would say, would that mean that the scaling+offset of ERA5 data is that thightly chosen that when applying it to another dataset/month, the data would fall out of the integer reach? Would be great if this would "just work". At the moment, apparently reading and writing ERA5 data with xarray results in incorrect netcdf files. I expected xarray would work off the shelf with these type of data, it feels like xarray is designed for doing exactly these type of things.	{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	Encoding error when saving netcdf 1373352524
1433334559	https://github.com/pydata/xarray/issues/7039#issuecomment-1433334559	https://api.github.com/repos/pydata/xarray/issues/7039	IC_kwDOAMm_X85VbvMf	etsmith14 35741277	2023-02-16T16:09:59Z	2023-02-16T16:09:59Z	NONE	Thanks for flagging the issue again. I've been using the same workaround of removing the dtype before writing to a zarr/netcdf. It's an extra step but has worked for me so far.	{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	Encoding error when saving netcdf 1373352524
1432943981	https://github.com/pydata/xarray/issues/7039#issuecomment-1432943981	https://api.github.com/repos/pydata/xarray/issues/7039	IC_kwDOAMm_X85VaP1t	veenstrajelmer 60435591	2023-02-16T11:29:45Z	2023-02-16T11:29:45Z	CONTRIBUTOR	I have also encountered an issue with reading of ERA5 data with open_mfdatset, writing it to_netcdf() and reading it again (https://github.com/Deltares/dfm_tools/issues/239). I was actually looking for a place to land this, and found your issue. My expectation is that this is because the ERA5 data is saved as ints, but all files have different offsets/scalingfactors. Upon opening it with open_mfdataset(), the data is converted to floats and to the offset/scalingfactor of the first file. This is fine, but the issue occurs I think (and what you also mention) since {'dtype': 'int16'} is in the encoding. The file is written as ints and this seems to mess up the data. (all a theory) A workaround is to remove the dtype from the encoding for all variables in the file (or update to float32), but that seems cumbersome.	{ "total_count": 1, "+1": 1, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	Encoding error when saving netcdf 1373352524
1248429163	https://github.com/pydata/xarray/issues/7039#issuecomment-1248429163	https://api.github.com/repos/pydata/xarray/issues/7039	IC_kwDOAMm_X85KaYRr	etsmith14 35741277	2022-09-15T18:01:43Z	2022-09-19T22:20:15Z	NONE	One last observation. If I just remove dtype from the original encoding and apply it to the dataset before writing to a netcdf, it works fine. Otherwise, I have the issue if I leave dtype in. ```python This works encoding = {'original_shape': (720, 109, 245), 'missing_value': -32767, '_FillValue': -32767, 'scale_factor': 0.0009673806360857793, 'add_offset': 282.08577424425226} python This does not work encoding = {'original_shape': (720, 109, 245), 'missing_value': -32767, 'dtype': 'int16', # the original form says it should be 'dtype': dtype('int16'), but this causes an error for me, whereas this form works fine to change between data types '_FillValue': -32767, 'scale_factor': 0.0009673806360857793, 'add_offset': 282.08577424425226} ```	{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	Encoding error when saving netcdf 1373352524
1248293772	https://github.com/pydata/xarray/issues/7039#issuecomment-1248293772	https://api.github.com/repos/pydata/xarray/issues/7039	IC_kwDOAMm_X85KZ3OM	etsmith14 35741277	2022-09-15T15:54:17Z	2022-09-19T22:19:36Z	NONE	That figure is basically what I am getting. Perhaps I designed the MRE poorly, however, I am curious as to what exactly from the encoding introduces the noise (I still need to read through the documentation more thoroughly)? If I don't apply the original encoding, I get a straight line at 0 for the difference plot. With that being said, if you are willing to try a test with the actual ERA5 data, I've attached it here via a box link. I went back and figured out I need at least several files to get large differences. Oddly enough, if I use only 2 files, the difference looks more like noise (+/- 0.0005). If I only open a single file, no difference. If I add a couple more files, the differences become quite large. Data: https://epri.box.com/s/spw9plf77lrjj1xz2spmwd34b5ls9dea ```python import xarray as xr import matplotlib.pyplot as plt Open original time series ERA5_t2m = xr.open_mfdataset(r'...\Test\T2m_*' + '.nc') # open 4 files Save time series as netcdf ERA5_t2m.to_netcdf(r"...\Test\Phx_Temperature_to_netcdf.nc") # save 4 files open bad netcdf ERA5_t2m_bad = xr.open_dataset(r'...\Test\Phx_Temperature_to_netcdf.nc') Lat and lon for Phx lats = [33.35] lons = [-112.86] plot the difference between the same point from the two files plt.plot(ERA5_t2m.t2m.sel(latitude = lats[0], longitude = lons[0], method='nearest') - ERA5_t2m_bad.t2m.sel(latitude = lats[0], longitude = lons[0], method='nearest')) ```	{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	Encoding error when saving netcdf 1373352524
1248366193	https://github.com/pydata/xarray/issues/7039#issuecomment-1248366193	https://api.github.com/repos/pydata/xarray/issues/7039	IC_kwDOAMm_X85KaI5x	etsmith14 35741277	2022-09-15T16:55:08Z	2022-09-15T16:55:08Z	NONE	Thanks for the explanation. Makes a lot more sense now! All figures I've attached are from the real ERA5 data. The figure I attached in my most recent comment with the alternative MRE (with the ERA5 data) is what I get when I run that code with the data I provided in the test folder.	{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	Encoding error when saving netcdf 1373352524
1248302788	https://github.com/pydata/xarray/issues/7039#issuecomment-1248302788	https://api.github.com/repos/pydata/xarray/issues/7039	IC_kwDOAMm_X85KZ5bE	rabernat 1197350	2022-09-15T16:02:17Z	2022-09-15T16:02:17Z	MEMBER	I am curious as to what exactly from the encoding introduces the noise (I still need to read through the documentation more thoroughly)? The encoding says that your data should be encoded according to the following pseudocode formula: `encoded = int((original - offset) / scale_factor) decoded = (scale_factor * float(encoded)) + offset` So the floating-point data are converted back and forth to a less precise type (integer) in order to save space. These numerical operations cannot preserve exact floating point accuracy. That's just how numerical float-point operations work. If you skip the encoding, then you just write the floating point bytes directly to disk, with no loss of precision. This sort of encoding a crude form of lossy compression that is still unfortunately in use, even though there are much better algorithms available (and built into netcdf and zarr). Differences on the order of 10^-14 should not affect any real-world calculations. However, this seems like a much, much smaller difference than the problem you originally reported. This suggests that the MRE does not actually reproduce the bug after all. How was the plot above (https://github.com/pydata/xarray/issues/7039#issue-1373352524) generated? From your actual MRE code? Or from your earlier example with real data?	{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	Encoding error when saving netcdf 1373352524
1248241823	https://github.com/pydata/xarray/issues/7039#issuecomment-1248241823	https://api.github.com/repos/pydata/xarray/issues/7039	IC_kwDOAMm_X85KZqif	rabernat 1197350	2022-09-15T15:12:34Z	2022-09-15T15:12:34Z	MEMBER	I'm puzzled that I was not able to reproduce this error. I modified the end slightly as follows ```python save dataset as netcdf ds.to_netcdf("test.nc") load saved dataset ds_test = xr.open_dataset('test.nc') verify that the two are equal within numerical precision xr.testing.assert_allclose(ds, ds_test) plot plt.plot(ds.t2m - ds_test.t2m) ``` In my case, the differences were just numerical noise (order 10^-14) I used the binder environment for this. I'm pretty stumped.	{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	Encoding error when saving netcdf 1373352524
1248098918	https://github.com/pydata/xarray/issues/7039#issuecomment-1248098918	https://api.github.com/repos/pydata/xarray/issues/7039	IC_kwDOAMm_X85KZHpm	rabernat 1197350	2022-09-15T13:25:11Z	2022-09-15T13:25:11Z	MEMBER	Thanks so much for taking the time to write up this detailed bug report! 🙏	{ "total_count": 1, "+1": 1, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	Encoding error when saving netcdf 1373352524

Advanced export

JSON shape: default, array, newline-delimited, object

CREATE TABLE [issue_comments] (
   [html_url] TEXT,
   [issue_url] TEXT,
   [id] INTEGER PRIMARY KEY,
   [node_id] TEXT,
   [user] INTEGER REFERENCES [users]([id]),
   [created_at] TEXT,
   [updated_at] TEXT,
   [author_association] TEXT,
   [body] TEXT,
   [reactions] TEXT,
   [performed_via_github_app] TEXT,
   [issue] INTEGER REFERENCES [issues]([id])
);
CREATE INDEX [idx_issue_comments_issue]
    ON [issue_comments] ([issue]);
CREATE INDEX [idx_issue_comments_user]
    ON [issue_comments] ([user]);

issue_comments

14 rows where issue = 1373352524 sorted by updated_at descending

This works

This does not work

Open original time series

Save time series as netcdf

open bad netcdf

Lat and lon for Phx

plot the difference between the same point from the two files

save dataset as netcdf

load saved dataset

verify that the two are equal within numerical precision

plot

Advanced export