issues
2 rows where repo = 13221727, type = "issue" and user = 35741277 sorted by updated_at descending
This data as json, CSV (advanced)
Suggested facets: created_at (date), updated_at (date), closed_at (date)
id | node_id | number | title | user | state | locked | assignee | milestone | comments | created_at | updated_at ▲ | closed_at | author_association | active_lock_reason | draft | pull_request | body | reactions | performed_via_github_app | state_reason | repo | type |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
1373352524 | I_kwDOAMm_X85R27JM | 7039 | Encoding error when saving netcdf | etsmith14 35741277 | open | 0 | 14 | 2022-09-14T17:35:44Z | 2023-03-08T13:51:07Z | NONE | What happened?When I select a single point (or regional subset) from a netcdf file and save as a new netcdf file, the newly saved file has an encoding issue that causes the data to be incorrect. How the data should look:
How the newly saved file looks when opened:
Additional context copied from original discussion: I am trying to save a regional subset of a netcdf file as netcdf file. I am first opening some data with dimensions of time, latitude, and longitude and then slicing that data by latitude and longitude to produce a smaller subset of the data. I save that smaller subset with the to_netcdf command. But when I go to open the new netcdf, the timeseries definitely wrong (see figures). The figure named 'correct' is what the temperature timeseries looks like when plotting directly from the original dataset. The figure named 'wrong' is what the temperature timeseries looks like when plotting from the newly saved netcdf (hopefully both figures attached properly). This happens when I select just a single point and save the data as a netcdf and it also happens when I save as a zarr file. However, when I load a single netcdf with open_dataset (instead of open_mfdataset) and save it as a new netcdf, everything is correct. So the issue seems to be coming from open_mfdataset. I've also noticed that not all grid points are incorrect, only some grid points have this issue. This doesn't happen when I convert to a series then save as a CSV, just happens when saving as a netcdf or zarr. Link to the original discussion: https://github.com/pydata/xarray/discussions/7025#discussion-4385791 What did you expect to happen?The data should have looked exactly the same. Minimal Complete Verifiable Example
I took the encoding data from the original data and applied it to a dummy dataset to reproduce the issueoriginal_encoding = { 'original_shape': (744, 109, 245), 'missing_value': -32767, '_FillValue': -32767, 'scale_factor': 0.0011997040993123216, 'add_offset': 269.40377331689564} create dummy dataframetimes = pd.date_range(start='2000-01-01',freq='1H',periods=8760) create datasetds = xr.Dataset({
't2m': xr.DataArray(
data = np.random.random(8760),
apply original encodingds.t2m.encoding = original_encoding save dataset as netcdfds.to_netcdf(r"...\test_ds2.nc") load saved datasetds_test = xr.open_dataset(r'...\test_ds2.nc') Plot the difference between the two variablesplt.plot(ds.t2m - ds_test.t2m)` ``` MVCE confirmation
Relevant log outputNo response Anything else we need to know?No response Environment
INSTALLED VERSIONS
------------------
commit: None
python: 3.8.8 (default, Apr 13 2021, 15:08:03) [MSC v.1916 64 bit (AMD64)]
python-bits: 64
OS: Windows
OS-release: 10
machine: AMD64
processor: Intel64 Family 6 Model 165 Stepping 5, GenuineIntel
byteorder: little
LC_ALL: None
LANG: en
LOCALE: ('English_United States', '1252')
libhdf5: 1.12.1
libnetcdf: 4.8.1
xarray: 2022.6.0
pandas: 1.4.1
numpy: 1.21.5
scipy: 1.8.0
netCDF4: 1.6.0
pydap: None
h5netcdf: 0.13.1
h5py: 3.6.0
Nio: None
zarr: 2.8.1
cftime: 1.6.0
nc_time_axis: 1.4.1
PseudoNetCDF: None
rasterio: None
cfgrib: 0.9.10.1
iris: 3.1.0
bottleneck: 1.3.5
dask: 2021.08.1
distributed: 2021.08.1
matplotlib: 3.5.1
cartopy: 0.18.0
seaborn: 0.11.1
numbagg: None
fsspec: 2022.8.2
cupy: None
pint: 0.19.2
sparse: None
flox: None
numpy_groupies: None
setuptools: 56.0.0
pip: 21.0.1
conda: None
pytest: None
IPython: 7.22.0
sphinx: 3.5.3
|
{ "url": "https://api.github.com/repos/pydata/xarray/issues/7039/reactions", "total_count": 1, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 1, "rocket": 0, "eyes": 0 } |
xarray 13221727 | issue | ||||||||
1371711272 | I_kwDOAMm_X85Rwqco | 7029 | Writing datarray/dataset to netcdf results in bad values | etsmith14 35741277 | closed | 0 | 1 | 2022-09-13T16:25:31Z | 2022-09-14T17:04:16Z | 2022-09-14T17:04:16Z | NONE | What is your issue?I am trying to save a regional subset of a netcdf file as netcdf file. I am first opening some data with dimensions of time, latitude, and longitude and then slicing that data by latitude and longitude to produce a smaller subset of the data. I save that smaller subset with the to_netcdf command. But when I go to open the new netcdf, the timeseries definitely wrong (see figures). The figure named 'correct' is what the temperature timeseries looks like when plotting directly from the original dataset. The figure named 'wrong' is what the temperature timeseries looks like when plotting from the newly saved netcdf (hopefully both figures attached properly). This happens when I select just a single point and save the data as a netcdf and it also happens when I save as a zarr file. However, when I load a single netcdf with open_dataset (instead of open_mfdataset) and save it as a new netcdf, everything is correct. So the issue seems to be coming from open_mfdataset. I've also noticed that not all grid points are incorrect, only some grid points have this issue. This doesn't happen when I convert to a series then save as a CSV, just happens when saving as a netcdf or zarr. `import xarray as xr import matplotlib.pyplot as plt lats = [33.35] lons = [-112.86] Load data from original filesERA5_t2m = xr.open_mfdataset(r'E:\ERA5\Temperature\T2m_*' + '.nc', parallel=True).sel(latitude = slice(37.25,31), longitude = slice(-115, -109)) plot original data (looks good)plt.plot((((ERA5_t2m.t2m.sel(latitude = lats[0], longitude = lons[0], method='nearest')- 273.15) * 9/5) + 32))
save regional subset as new netcdfERA5_t2m.t2m.to_netcdf(r"E:\Arizona_test\Arizona_Temperature.nc") open new netcdf regional subsetERA5_t2m_AZ = xr.open_dataset(r'E:\Arizona_test\Arizona_Temperature.nc') plot same point from new netcdf (looks bad)plt.plot((((ERA5_t2m_AZ.t2m.sel(latitude = lats[0], longitude = lons[0], method='nearest')- 273.15) * 9/5) + 32))`
|
{ "url": "https://api.github.com/repos/pydata/xarray/issues/7029/reactions", "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
completed | xarray 13221727 | issue |
Advanced export
JSON shape: default, array, newline-delimited, object
CREATE TABLE [issues] ( [id] INTEGER PRIMARY KEY, [node_id] TEXT, [number] INTEGER, [title] TEXT, [user] INTEGER REFERENCES [users]([id]), [state] TEXT, [locked] INTEGER, [assignee] INTEGER REFERENCES [users]([id]), [milestone] INTEGER REFERENCES [milestones]([id]), [comments] INTEGER, [created_at] TEXT, [updated_at] TEXT, [closed_at] TEXT, [author_association] TEXT, [active_lock_reason] TEXT, [draft] INTEGER, [pull_request] TEXT, [body] TEXT, [reactions] TEXT, [performed_via_github_app] TEXT, [state_reason] TEXT, [repo] INTEGER REFERENCES [repos]([id]), [type] TEXT ); CREATE INDEX [idx_issues_repo] ON [issues] ([repo]); CREATE INDEX [idx_issues_milestone] ON [issues] ([milestone]); CREATE INDEX [idx_issues_assignee] ON [issues] ([assignee]); CREATE INDEX [idx_issues_user] ON [issues] ([user]);