home / github / issues

Menu
  • Search all tables
  • GraphQL API

issues: 1225191984

This data as json

id node_id number title user state locked assignee milestone comments created_at updated_at closed_at author_association active_lock_reason draft pull_request body reactions performed_via_github_app state_reason repo type
1225191984 I_kwDOAMm_X85JBvIw 6570 h5netcdf-engine now reads attributes with array length 1 as scalar 16100116 closed 0     1 2022-05-04T10:34:06Z 2023-09-19T01:02:24Z 2023-09-19T01:02:24Z NONE      

What is your issue?

The h5netcdf engine for reading NetCDF4-files was recently changed https://github.com/h5netcdf/h5netcdf/pull/151 so that when reading attributes, any 1D array/list of length 1 gets turned into a scalar element/item. The change happened with version 0.14.0.

The issue is that the xarray documentation still describes the old h5netcdf-behaviour on https://docs.xarray.dev/en/stable/user-guide/io.html?highlight=attributes%20h5netcdf#netcdf

Could we mention this also on https://docs.xarray.dev/en/stable/generated/xarray.open_dataset.html#xarray.open_dataset under the engine argument, or just make sure it links to the above page?

I initially looked under https://docs.xarray.dev/en/stable/user-guide/io.html?highlight=string#string-encoding because my issue was for a string array/list, but maybe too much to mention there if this is a general change that affects attributes of all types.

As explained on the h5netcdf-issue tracker, the reason for dropping/squeezing 1-length-array attributes to scalars, is for compatibility with the other NetCDF4-engine or NetCDF in general (and there might be some varying opinions about how good that is, vs. fully using features available in HDF5). (Interesting to note is that when writing, an attribute with a python list of length 1 does give an array of length 1 in the HDF5/NetCDF4-file, the dropping of array dimension only happens only when reading.)

Adding the invalid_netcdf=True argument when loading does not change the behaviour. Maybe it could be interesting to use it to generally allow 1-length attribute arrays? Now, I think every usage of array-attributes will need conversions like list_in_recent_version = attribute if isinstance(attribute, list) else [attribute] or always_list = list(attribute if isinstance(attribute, (list, np.ndarray)) else [attribute]) to support both old and new versions. Otherwise, iterating over an attribute string will cause surprises by iterating over its characters instead of doing a single iteration that yields the single string (as in older versions).

Minimal example

This serves to clarify what happens. The issue is not about reverting to the old behaviour (although I liked it), just updating the xarray documentation. ``` import xarray as xr import numpy as np ds = xr.Dataset() ds['stuff'] = xr.DataArray(np.random.randn(2), dims='x') ds['stuff'].attrs['strings_0D_one'] = 'abc' ds['stuff'].attrs['strings_1D_two'] = ['abc', 'def'] ds['stuff'].attrs['strings_1D_one'] = ['abc'] path = 'demo.nc' ds.to_netcdf(path, engine='h5netcdf', format='netCDF4') ds2 = xr.load_dataset(path, engine='h5netcdf')

print(type(ds2['stuff'].attrs['strings_0D_one']).name, repr(ds2['stuff'].attrs['strings_0D_one'])) print(type(ds2['stuff'].attrs['strings_1D_two']).name, repr(ds2['stuff'].attrs['strings_1D_two'])) print(type(ds2['stuff'].attrs['strings_1D_one']).name, repr(ds2['stuff'].attrs['strings_1D_one'])) ``` With h5netcdf: 0.12.0 (python: 3.7.9, OS: Windows, OS-release: 10, libhdf5: 1.10.4, xarray: 0.20.1, pandas: 1.3.4, numpy: 1.21.5, netCDF4: None, h5netcdf: 0.12.0, h5py: 2.10.0) the printouts are:

str 'abc' ndarray array(['abc', 'def'], dtype=object) ndarray array(['abc'], dtype=object)

With h5netcdf: 1.0.0 (python: 3.8.11, OS: Linux, OS-release: 3.10.0-1160.49.1.el7.x86_64, libhdf5: 1.10.4, xarray: 0.20.1, pandas: 1.4.2, numpy: 1.21.2, netCDF4: None, h5netcdf: 1.0.0, h5py: 2.10.0) the printouts are:

str 'abc' list ['abc', 'def'] str 'abc'

I have tested that direct reading by h5py.File gives str, ndarray, ndarray so the change is not in the writing or h5py.

{
    "url": "https://api.github.com/repos/pydata/xarray/issues/6570/reactions",
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  completed 13221727 issue

Links from other tables

  • 2 rows from issues_id in issues_labels
  • 1 row from issue in issue_comments
Powered by Datasette · Queries took 0.806ms · About: xarray-datasette