github: issues: 2 rows where repo = 13221727, type = "issue" and user = 35741277 sorted by updated

2 rows where repo = 13221727, type = "issue" and user = 35741277 sorted by updated_at descending

Search:

descending

id	node_id	number	title	user	state	locked	assignee	milestone	comments	created_at	updated_at ▲	closed_at	author_association	active_lock_reason	draft	pull_request	body	reactions	performed_via_github_app	state_reason	repo	type
1373352524	I_kwDOAMm_X85R27JM	7039	Encoding error when saving netcdf	etsmith14 35741277	open	0			14	2022-09-14T17:35:44Z	2023-03-08T13:51:07Z		NONE				What happened? When I select a single point (or regional subset) from a netcdf file and save as a new netcdf file, the newly saved file has an encoding issue that causes the data to be incorrect. How the data should look: How the newly saved file looks when opened: Additional context copied from original discussion: I am trying to save a regional subset of a netcdf file as netcdf file. I am first opening some data with dimensions of time, latitude, and longitude and then slicing that data by latitude and longitude to produce a smaller subset of the data. I save that smaller subset with the to_netcdf command. But when I go to open the new netcdf, the timeseries definitely wrong (see figures). The figure named 'correct' is what the temperature timeseries looks like when plotting directly from the original dataset. The figure named 'wrong' is what the temperature timeseries looks like when plotting from the newly saved netcdf (hopefully both figures attached properly). This happens when I select just a single point and save the data as a netcdf and it also happens when I save as a zarr file. However, when I load a single netcdf with open_dataset (instead of open_mfdataset) and save it as a new netcdf, everything is correct. So the issue seems to be coming from open_mfdataset. I've also noticed that not all grid points are incorrect, only some grid points have this issue. This doesn't happen when I convert to a series then save as a CSV, just happens when saving as a netcdf or zarr. Link to the original discussion: https://github.com/pydata/xarray/discussions/7025#discussion-4385791 What did you expect to happen? The data should have looked exactly the same. Minimal Complete Verifiable Example ``Pythonimport numpy as np import xarray as xr import pandas as pd import matplotlib.pyplot as plt I took the encoding data from the original data and applied it to a dummy dataset to reproduce the issue original_encoding = { 'original_shape': (744, 109, 245), 'missing_value': -32767, '_FillValue': -32767, 'scale_factor': 0.0011997040993123216, 'add_offset': 269.40377331689564} create dummy dataframe times = pd.date_range(start='2000-01-01',freq='1H',periods=8760) create dataset ds = xr.Dataset({ 't2m': xr.DataArray( data = np.random.random(8760), dims = ['time'], coords = {'time': times}, `), 'tmax': xr.DataArray( data = np.random.random(8760), dims = ['time'], coords = {'time': times}, ) }, )` apply original encoding ds.t2m.encoding = original_encoding save dataset as netcdf ds.to_netcdf(r"...\test_ds2.nc") load saved dataset ds_test = xr.open_dataset(r'...\test_ds2.nc') Plot the difference between the two variables plt.plot(ds.t2m - ds_test.t2m)` ``` MVCE confirmation [x] Minimal example — the example is as focused as reasonably possible to demonstrate the underlying issue in xarray. [x] Complete example — the example is self-contained, including all data and the text of any traceback. [ ] Verifiable example — the example copy & pastes into an IPython prompt or Binder notebook, returning the result. [ ] New issue — a search of GitHub Issues suggests this is not a duplicate. Relevant log output No response Anything else we need to know? No response Environment INSTALLED VERSIONS ------------------ commit: None python: 3.8.8 (default, Apr 13 2021, 15:08:03) [MSC v.1916 64 bit (AMD64)] python-bits: 64 OS: Windows OS-release: 10 machine: AMD64 processor: Intel64 Family 6 Model 165 Stepping 5, GenuineIntel byteorder: little LC_ALL: None LANG: en LOCALE: ('English_United States', '1252') libhdf5: 1.12.1 libnetcdf: 4.8.1 xarray: 2022.6.0 pandas: 1.4.1 numpy: 1.21.5 scipy: 1.8.0 netCDF4: 1.6.0 pydap: None h5netcdf: 0.13.1 h5py: 3.6.0 Nio: None zarr: 2.8.1 cftime: 1.6.0 nc_time_axis: 1.4.1 PseudoNetCDF: None rasterio: None cfgrib: 0.9.10.1 iris: 3.1.0 bottleneck: 1.3.5 dask: 2021.08.1 distributed: 2021.08.1 matplotlib: 3.5.1 cartopy: 0.18.0 seaborn: 0.11.1 numbagg: None fsspec: 2022.8.2 cupy: None pint: 0.19.2 sparse: None flox: None numpy_groupies: None setuptools: 56.0.0 pip: 21.0.1 conda: None pytest: None IPython: 7.22.0 sphinx: 3.5.3	{ "url": "https://api.github.com/repos/pydata/xarray/issues/7039/reactions", "total_count": 1, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 1, "rocket": 0, "eyes": 0 }			xarray 13221727	issue
1371711272	I_kwDOAMm_X85Rwqco	7029	Writing datarray/dataset to netcdf results in bad values	etsmith14 35741277	closed	0			1	2022-09-13T16:25:31Z	2022-09-14T17:04:16Z	2022-09-14T17:04:16Z	NONE				What is your issue? I am trying to save a regional subset of a netcdf file as netcdf file. I am first opening some data with dimensions of time, latitude, and longitude and then slicing that data by latitude and longitude to produce a smaller subset of the data. I save that smaller subset with the to_netcdf command. But when I go to open the new netcdf, the timeseries definitely wrong (see figures). The figure named 'correct' is what the temperature timeseries looks like when plotting directly from the original dataset. The figure named 'wrong' is what the temperature timeseries looks like when plotting from the newly saved netcdf (hopefully both figures attached properly). This happens when I select just a single point and save the data as a netcdf and it also happens when I save as a zarr file. However, when I load a single netcdf with open_dataset (instead of open_mfdataset) and save it as a new netcdf, everything is correct. So the issue seems to be coming from open_mfdataset. I've also noticed that not all grid points are incorrect, only some grid points have this issue. This doesn't happen when I convert to a series then save as a CSV, just happens when saving as a netcdf or zarr. `import xarray as xr import matplotlib.pyplot as plt lats = [33.35] lons = [-112.86] Load data from original files ERA5_t2m = xr.open_mfdataset(r'E:\ERA5\Temperature\T2m_' + '.nc', parallel=True).sel(latitude = slice(37.25,31), longitude = slice(-115, -109)) plot original data (looks good) plt.plot((((ERA5_t2m.t2m.sel(latitude = lats[0], longitude = lons[0], method='nearest')- 273.15) 9/5) + 32)) save regional subset as new netcdf ERA5_t2m.t2m.to_netcdf(r"E:\Arizona_test\Arizona_Temperature.nc") open new netcdf regional subset ERA5_t2m_AZ = xr.open_dataset(r'E:\Arizona_test\Arizona_Temperature.nc') plot same point from new netcdf (looks bad) plt.plot((((ERA5_t2m_AZ.t2m.sel(latitude = lats[0], longitude = lons[0], method='nearest')- 273.15) * 9/5) + 32))`	{ "url": "https://api.github.com/repos/pydata/xarray/issues/7029/reactions", "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }		completed	xarray 13221727	issue

Advanced export

JSON shape: default, array, newline-delimited, object

CREATE TABLE [issues] (
   [id] INTEGER PRIMARY KEY,
   [node_id] TEXT,
   [number] INTEGER,
   [title] TEXT,
   [user] INTEGER REFERENCES [users]([id]),
   [state] TEXT,
   [locked] INTEGER,
   [assignee] INTEGER REFERENCES [users]([id]),
   [milestone] INTEGER REFERENCES [milestones]([id]),
   [comments] INTEGER,
   [created_at] TEXT,
   [updated_at] TEXT,
   [closed_at] TEXT,
   [author_association] TEXT,
   [active_lock_reason] TEXT,
   [draft] INTEGER,
   [pull_request] TEXT,
   [body] TEXT,
   [reactions] TEXT,
   [performed_via_github_app] TEXT,
   [state_reason] TEXT,
   [repo] INTEGER REFERENCES [repos]([id]),
   [type] TEXT
);
CREATE INDEX [idx_issues_repo]
    ON [issues] ([repo]);
CREATE INDEX [idx_issues_milestone]
    ON [issues] ([milestone]);
CREATE INDEX [idx_issues_assignee]
    ON [issues] ([assignee]);
CREATE INDEX [idx_issues_user]
    ON [issues] ([user]);

issues

2 rows where repo = 13221727, type = "issue" and user = 35741277 sorted by updated_at descending

What happened?

What did you expect to happen?

Minimal Complete Verifiable Example

I took the encoding data from the original data and applied it to a dummy dataset to reproduce the issue

create dummy dataframe

create dataset

apply original encoding

save dataset as netcdf

load saved dataset

Plot the difference between the two variables

MVCE confirmation

Relevant log output

Anything else we need to know?

Environment

What is your issue?

Load data from original files

plot original data (looks good)

save regional subset as new netcdf

open new netcdf regional subset

plot same point from new netcdf (looks bad)

Advanced export