home / github / issues

Menu
  • GraphQL API
  • Search all tables

issues: 819379617

This data as json

id node_id number title user state locked assignee milestone comments created_at updated_at closed_at author_association active_lock_reason draft pull_request body reactions performed_via_github_app state_reason repo type
819379617 MDU6SXNzdWU4MTkzNzk2MTc= 4981 Allows numpy.bool_ attrs when writing with h5netcdf engine 31376402 closed 0     5 2021-03-01T22:53:04Z 2021-03-12T14:19:14Z 2021-03-12T14:19:14Z CONTRIBUTOR      

What happened:

Round trip dataset using h5netcdf + invalid_netcdf=True fails for bool attribute due to lack of support in xarray for numpy.bool_ (note that is not the same as numpy.bool)

What you expected to happen:

Attributes of numpy.bool_ type should be supported because the h5netcdf has no issue with it.

Minimal Complete Verifiable Example:

```python import h5netcdf import numpy as np import xarray as xr

dset = xr.Dataset( coords={"x0": np.array([1, 2, 3])}, data_vars={"y0": ("x0", np.array([5, 6, 7]))} ) dset.attrs["my_bool_attr"] = True print(type(dset.attrs["my_bool_attr"]))

print("\n", dset) ```

``` <class 'bool'>

<xarray.Dataset> Dimensions: (x0: 3) Coordinates: * x0 (x0) int64 1 2 3 Data variables: y0 (x0) int64 5 6 7 Attributes: my_bool_attr: True ```

```python file_name = "my_dset.nc" dset.to_netcdf(file_name, engine="h5netcdf", invalid_netcdf=True)

loaded_dset = xr.load_dataset(file_name, engine="h5netcdf") print(type(loaded_dset.attrs["my_bool_attr"]))

print("\n", loaded_dset)

This should be working

file_name2 = "file_numpy_bool.nc" loaded_dset.to_netcdf(file_name2, engine="h5netcdf", invalid_netcdf=True) loaded_dset2 = xr.load_dataset(file_name2, engine="h5netcdf")

print("\n", loaded_dset) ```

Raises:

``` <class 'numpy.bool_'>

<xarray.Dataset> Dimensions: (x0: 3) Coordinates: * x0 (x0) int64 1 2 3 Data variables: y0 (x0) int64 5 6 7 Attributes: my_bool_attr: True


TypeError Traceback (most recent call last) <ipython-input-6-f74d2909a4a1> in <module> 6 # This should be working 7 file_name2 = "file_numpy_bool.nc" ----> 8 loaded_dset.to_netcdf(file_name2, engine="h5netcdf", invalid_netcdf=True) 9 loaded_dset2 = xr.load_dataset(file_name2, engine="h5netcdf") 10

/usr/local/anaconda3/envs/dclab/lib/python3.7/site-packages/xarray/core/dataset.py in to_netcdf(self, path, mode, format, group, engine, encoding, unlimited_dims, compute, invalid_netcdf) 1697 unlimited_dims=unlimited_dims, 1698 compute=compute, -> 1699 invalid_netcdf=invalid_netcdf, 1700 ) 1701

/usr/local/anaconda3/envs/dclab/lib/python3.7/site-packages/xarray/backends/api.py in to_netcdf(dataset, path_or_file, mode, format, group, engine, encoding, unlimited_dims, compute, multifile, invalid_netcdf) 1057 # validate Dataset keys, DataArray names, and attr keys/values 1058 _validate_dataset_names(dataset) -> 1059 _validate_attrs(dataset) 1060 1061 try:

/usr/local/anaconda3/envs/dclab/lib/python3.7/site-packages/xarray/backends/api.py in _validate_attrs(dataset) 229 # Check attrs on the dataset itself 230 for k, v in dataset.attrs.items(): --> 231 check_attr(k, v) 232 233 # Check attrs on each variable within the dataset

/usr/local/anaconda3/envs/dclab/lib/python3.7/site-packages/xarray/backends/api.py in check_attr(name, value) 221 if not isinstance(value, (str, Number, np.ndarray, np.number, list, tuple)): 222 raise TypeError( --> 223 f"Invalid value for attr {name!r}: {value!r} must be a number, " 224 "a string, an ndarray or a list/tuple of " 225 "numbers/strings for serialization to netCDF "

TypeError: Invalid value for attr 'my_bool_attr': True must be a number, a string, an ndarray or a list/tuple of numbers/strings for serialization to netCDF files ```

Anything else we need to know?:

Xarray is awesome!! Thank you, guys! ❤

We are adopting it fully for a quantum computing experimental framework (Quantify). This issue is very critical for us at the moment 😭

For reference, the following works in h5netcdf:

```python with h5netcdf.File('mydata.nc', 'w', invalid_netcdf=True) as f: f.dimensions = {'x': 5} v = f.create_variable('hello', ('x',), float) v.attrs['foo'] = True print(v.attrs) print(type(v.attrs["foo"]))

with h5netcdf.File('mydata.nc', 'r') as file: v = file['hello'] print(v.attrs) print(type(v.attrs["foo"]))

with h5netcdf.File('mydata.nc', 'w', invalid_netcdf=True) as f: f.dimensions = {'x': 5} v = f.create_variable('hello', ('x',), float) v.attrs['foo'] = np.bool_(True) print(v.attrs) print(type(v.attrs["foo"])) ```

<class 'h5netcdf.attrs.Attributes'> foo: True <class 'numpy.bool_'> <class 'h5netcdf.attrs.Attributes'> foo: True <class 'numpy.bool_'> <class 'h5netcdf.attrs.Attributes'> foo: True <class 'numpy.bool_'>

How to fix:

In /xarray/backends/api.py:221:

python if not isinstance(value, (str, Number, np.ndarray, np.number, list, tuple)): raise TypeError( f"Invalid value for attr {name!r}: {value!r} must be a number, " "a string, an ndarray or a list/tuple of " "numbers/strings for serialization to netCDF " "files" ) add np.bool_:

python if not isinstance(value, (str, Number, np.ndarray, np.number, np.bool_, list, tuple)): raise TypeError( f"Invalid value for attr {name!r}: {value!r} must be a number, " "a string, an ndarray or a list/tuple of " "numbers/strings for serialization to netCDF " "files" )

I did a quick test (xarray-0.17.1.dev3+g48378c4b) and it seems to work, but the tests (without any changes to the code) fail on my local clone of the repo so it would be more difficult to go through a PR for this.

Considering that it is a single line of code, could this be deployed relatively easily into the master branch by some maintainer? That would be highly appreciated! 🥺

Environment:

Output of <tt>xr.show_versions()</tt> INSTALLED VERSIONS ------------------ commit: None python: 3.7.6 | packaged by conda-forge | (default, Jan 7 2020, 22:05:27) [Clang 9.0.1 ] python-bits: 64 OS: Darwin OS-release: 18.7.0 machine: x86_64 processor: i386 byteorder: little LC_ALL: en_US.UTF-8 LANG: en_US.UTF-8 LOCALE: en_US.UTF-8 libhdf5: 1.10.4 libnetcdf: None xarray: 0.17.0 pandas: 1.0.1 numpy: 1.19.3 scipy: 1.5.4 netCDF4: None pydap: None h5netcdf: 0.10.0 h5py: 2.10.0 Nio: None zarr: 2.6.1 cftime: None nc_time_axis: None PseudoNetCDF: None rasterio: None cfgrib: None iris: None bottleneck: None dask: None distributed: None matplotlib: 3.1.3 cartopy: None seaborn: 0.11.0 numbagg: None pint: None setuptools: 45.2.0.post20200209 pip: 20.0.2 conda: None pytest: 5.3.5 IPython: 7.12.0 sphinx: 3.2.1
{
    "url": "https://api.github.com/repos/pydata/xarray/issues/4981/reactions",
    "total_count": 1,
    "+1": 1,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  completed 13221727 issue

Links from other tables

  • 1 row from issues_id in issues_labels
  • 5 rows from issue in issue_comments
Powered by Datasette · Queries took 0.792ms · About: xarray-datasette