home / github

Menu
  • GraphQL API
  • Search all tables

issues

Table actions
  • GraphQL API for issues

2 rows where user = 971382 sorted by updated_at descending

✎ View and edit SQL

This data as json, CSV (advanced)

Suggested facets: created_at (date), updated_at (date), closed_at (date)

type 2

  • issue 1
  • pull 1

state 2

  • closed 1
  • open 1

repo 1

  • xarray 2
id node_id number title user state locked assignee milestone comments created_at updated_at ▲ closed_at author_association active_lock_reason draft pull_request body reactions performed_via_github_app state_reason repo type
343659822 MDU6SXNzdWUzNDM2NTk4MjI= 2304 float32 instead of float64 when decoding int16 with scale_factor netcdf var using xarray DevDaoud 971382 closed 0     32 2018-07-23T14:35:12Z 2024-03-15T16:31:06Z 2024-03-15T16:31:06Z NONE      

Code Sample :

Considering a netcdf file file with the following variable: short agc_40hz(time, meas_ind) ; agc_40hz:_FillValue = 32767s ; agc_40hz:units = "dB" ; agc_40hz:scale_factor = 0.01 ;

Code:

```python from netCDF4 import Dataset import xarray as xr

d = Dataset("test.nc") a = d.variables['agc_40hz'][:].flatten()[69] ## 21.940000000000001 'numpy.float64' x = xr.open_dataset("test.nc") b = x['agc_40hz'].values.flatten()[69] ## 21.939998626708984 'numpy.float32' abs(a - b) # 0.000001373291017 ```

Problem description :

Different behaviour of xarray comparing to netCDF4 Dataset

When reading the dataset with xarray we found that the decoded type was numpy.float32 instead of numpy.float64 This netcdf variable has an int16 dtype when the variable is read with the netCDF4 library directly, it is automatically converted to numpy.float64. in our case we loose on precision when using xarray. We found two solutions for this:

First solution :

This solution aims to prevent auto_maskandscale

python d = Dataset("test.nc") a = d.variables['agc_40hz'][:].flatten()[69] ## 21.940000000000001 'numpy.float64' x = xr.open_dataset("test.nc", mask_and_scale=False, decode_times=False) b = x['agc_40hz'].values.flatten()[69] ## 21.940000000000001 'numpy.float64' abs(a - b) # 0.000000000000000

Modification in xarray/backends/netCDF4_.py line 241

```python def _disable_auto_decode_variable(var): """Disable automatic decoding on a netCDF4.Variable.

We handle these types of decoding ourselves.
"""
pass
# var.set_auto_maskandscale(False)
# # only added in netCDF4-python v1.2.8
# with suppress(AttributeError):
#     var.set_auto_chartostring(False)

```

Second solution :

This solution uses numpy.float64 whatever integer type provided.

python d = Dataset("test.nc") a = d.variables['agc_40hz'][:].flatten()[69] ## 21.940000000000001 'numpy.float64' x = xr.open_dataset("test.nc") b = x['agc_40hz'].values.flatten()[69] ## 21.940000000000001 'numpy.float64' abs(a - b) # 0.000000000000000

Modification in xarray/core/dtypes.py line 85

```python def maybe_promote(dtype): """Simpler equivalent of pandas.core.common._maybe_promote

Parameters
----------
dtype : np.dtype

Returns
-------
dtype : Promoted dtype that can hold missing values.
fill_value : Valid missing value for the promoted dtype.
"""
# N.B. these casting rules should match pandas
if np.issubdtype(dtype, np.floating):
    fill_value = np.nan
elif np.issubdtype(dtype, np.integer):
    #########################
    #OLD CODE BEGIN
    #########################
    # if dtype.itemsize <= 2:
    #     dtype = np.float32
    # else:
    #     dtype = np.float64
    #########################
    #OLD CODE END
    #########################

    #########################
    #NEW CODE BEGIN
    #########################
    dtype = np.float64 # whether it's int16 or int32 we use float64
    #########################
    #NEW CODE END
    #########################
    fill_value = np.nan
elif np.issubdtype(dtype, np.complexfloating):
    fill_value = np.nan + np.nan * 1j
elif np.issubdtype(dtype, np.datetime64):
    fill_value = np.datetime64('NaT')
elif np.issubdtype(dtype, np.timedelta64):
    fill_value = np.timedelta64('NaT')
else:
    dtype = object
    fill_value = np.nan
return np.dtype(dtype), fill_value

```

Solution number 2 would be great for us. At this point we don't know if this modification would introduce some side effects. Is there another way to avoid this problem ?

Expected Output

In our case we expect the variable to be in numpy.float64 as it is done by netCDF4.

Output of xr.show_versions()

INSTALLED VERSIONS ------------------ commit: None python: 3.5.2.final.0 python-bits: 64 OS: Linux OS-release: 4.15.0-23-generic machine: x86_64 processor: x86_64 byteorder: little LC_ALL: None LANG: en_US.UTF-8 LOCALE: en_US.UTF-8 xarray: 0.10.8 pandas: 0.23.3 numpy: 1.15.0rc2 scipy: 1.1.0 netCDF4: 1.4.0 h5netcdf: None h5py: None Nio: None zarr: None bottleneck: None cyordereddict: None dask: 0.18.1 distributed: 1.22.0 matplotlib: 2.2.2 cartopy: None seaborn: None setuptools: 40.0.0 pip: 10.0.1 conda: None pytest: 3.6.3 IPython: 6.4.0 sphinx: None
{
    "url": "https://api.github.com/repos/pydata/xarray/issues/2304/reactions",
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  completed xarray 13221727 issue
407746874 MDExOlB1bGxSZXF1ZXN0MjUxMTUwNTk2 2751 Ability to force float64 instead of float32 issue #2304 DevDaoud 971382 open 0     16 2019-02-07T15:10:58Z 2022-06-09T14:50:17Z   FIRST_TIMER   0 pydata/xarray/pulls/2751

Ability to promote to float64 instead of float32 when dealing with int variables with scale_factor. added parameter force_promote_float64 (False by default) to open_dataset and open_dataarray that enables this behaviour when True.

  • [x] Closes #2304
  • [x] Tests added
  • [x] Fully documented, including whats-new.rst for all changes and api.rst for new API
{
    "url": "https://api.github.com/repos/pydata/xarray/issues/2751/reactions",
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
    xarray 13221727 pull

Advanced export

JSON shape: default, array, newline-delimited, object

CSV options:

CREATE TABLE [issues] (
   [id] INTEGER PRIMARY KEY,
   [node_id] TEXT,
   [number] INTEGER,
   [title] TEXT,
   [user] INTEGER REFERENCES [users]([id]),
   [state] TEXT,
   [locked] INTEGER,
   [assignee] INTEGER REFERENCES [users]([id]),
   [milestone] INTEGER REFERENCES [milestones]([id]),
   [comments] INTEGER,
   [created_at] TEXT,
   [updated_at] TEXT,
   [closed_at] TEXT,
   [author_association] TEXT,
   [active_lock_reason] TEXT,
   [draft] INTEGER,
   [pull_request] TEXT,
   [body] TEXT,
   [reactions] TEXT,
   [performed_via_github_app] TEXT,
   [state_reason] TEXT,
   [repo] INTEGER REFERENCES [repos]([id]),
   [type] TEXT
);
CREATE INDEX [idx_issues_repo]
    ON [issues] ([repo]);
CREATE INDEX [idx_issues_milestone]
    ON [issues] ([milestone]);
CREATE INDEX [idx_issues_assignee]
    ON [issues] ([assignee]);
CREATE INDEX [idx_issues_user]
    ON [issues] ([user]);
Powered by Datasette · Queries took 44.147ms · About: xarray-datasette