issue_comments: 1477447875
This data as json
html_url | issue_url | id | node_id | user | created_at | updated_at | author_association | body | reactions | performed_via_github_app | issue |
---|---|---|---|---|---|---|---|---|---|---|---|
https://github.com/pydata/xarray/issues/7652#issuecomment-1477447875 | https://api.github.com/repos/pydata/xarray/issues/7652 | 1477447875 | IC_kwDOAMm_X85YEBDD | 5821660 | 2023-03-21T08:37:48Z | 2023-03-21T08:37:48Z | MEMBER | @basnijholt For the string issue this is somehwat kind of netcdf/numpy based issue with VLEN types. XRef: https://unidata.github.io/netcdf4-python/#dealing-with-strings
And numpy will create a VLEN string array if no dtype is given, like in your case. At least netCDF4 and h5netcdf backends are consistent in their writing (creating similar hdf5-files) and reading back (object-dtype): plain netCDF4```python import netCDF4 as nc import numpy as np data = np.array([["a", "b"], ["c", "d"]], dtype="<U1") print(f"source dtype: {data.dtype.str}\n", ) auto = False with nc.Dataset("test-plain-netcdf4.nc", mode="w") as ds: print("Write NC-File") ds.set_auto_maskandscale(auto) ds.set_auto_chartostring(auto) ds.createDimension("x", size=2) ds.createDimension("y", size=2) var = ds.createVariable("da", data.dtype.str, dimensions=("x", "y")) var[:] = data print("Variable\n") print(var) print(var.dtype) print("\nContents\n") print(var[:]) print(var[:].dtype) with nc.Dataset("test-plain-netcdf4.nc") as ds: print("\nRead NC-File") ds.set_auto_maskandscale(auto) ds.set_auto_chartostring(auto) da = ds["da"] print("Variable\n") print(da) print(da.dtype) da = ds["da"][:] print("\nContents\n") print(da) print(da.dtype) ``` ```python source dtype: <U1 Write NC-File Variable <class 'netCDF4._netCDF4.Variable'> vlen da(x, y) vlen data type: <class 'str'> unlimited dimensions: current shape = (2, 2) <class 'str'> Contents [['a' 'b'] ['c' 'd']] object Read NC-File Variable <class 'netCDF4._netCDF4.Variable'> vlen da(x, y) vlen data type: <class 'str'> unlimited dimensions: current shape = (2, 2) <class 'str'> Contents [['a' 'b'] ['c' 'd']] object ``` ```bash netcdf test-plain-netcdf4 { dimensions: x = 2 ; y = 2 ; variables: string da(x, y) ; data: da = "a", "b", "c", "d" ; } HDF5 "test-plain-netcdf4.nc" { DATASET "da" { DATATYPE H5T_STRING { STRSIZE H5T_VARIABLE; STRPAD H5T_STR_NULLTERM; CSET H5T_CSET_UTF8; CTYPE H5T_C_S1; } DATASPACE SIMPLE { ( 2, 2 ) / ( 2, 2 ) } DATA { (0,0): "a", "b", (1,0): "c", "d" } ATTRIBUTE "DIMENSION_LIST" { DATATYPE H5T_VLEN { H5T_REFERENCE { H5T_STD_REF_OBJECT }} DATASPACE SIMPLE { ( 2 ) / ( 2 ) } DATA { (0): (), () } } ATTRIBUTE "_Netcdf4Coordinates" { DATATYPE H5T_STD_I32LE DATASPACE SIMPLE { ( 2 ) / ( 2 ) } DATA { (0): 0, 1 } } } } ```plain h5netcdf```python import h5netcdf.legacyapi as h5nc import h5py data = np.array([["a", "b"], ["c", "d"]], dtype="<U1") print(f"source dtype: {data.dtype.str}\n", ) with h5nc.Dataset("test-plain-h5netcdf.nc", mode="w") as ds: print("Write NC-File") ds.createDimension("x", 2) ds.createDimension("y", 2) dtype = h5py.string_dtype() print("Source dtype:", dtype) var = ds.createVariable("da", dtype, dimensions=("x", "y")) var[:] = data print("Variable\n") print(var) print(var.dtype) print("\nContents\n") print(var[:]) print(var[:].dtype) with h5nc.Dataset("test-plain-h5netcdf.nc") as ds: print("\nRead NC-File") da = ds["da"] print("Variable\n") print(da) print(da.dtype) da = ds["da"][:] print("\nContents\n") print(da) print(da.dtype) ``` ```python source dtype: <U1 Write NC-File Source dtype: object Variable <h5netcdf.legacyapi.Variable '/da': dimensions ('x', 'y'), shape (2, 2), dtype <class 'str'>> Attributes: <class 'str'> Contents [['a' 'b'] ['c' 'd']] object Read NC-File Variable <h5netcdf.legacyapi.Variable '/da': dimensions ('x', 'y'), shape (2, 2), dtype <class 'str'>> Attributes: <class 'str'> Contents [['a' 'b'] ['c' 'd']] object ``` ```bash netcdf test-plain-h5netcdf { dimensions: x = 2 ; y = 2 ; variables: string da(x, y) ; data: da = "a", "b", "c", "d" ; } HDF5 "test-plain-h5netcdf.nc" { DATASET "da" { DATATYPE H5T_STRING { STRSIZE H5T_VARIABLE; STRPAD H5T_STR_NULLTERM; CSET H5T_CSET_UTF8; CTYPE H5T_C_S1; } DATASPACE SIMPLE { ( 2, 2 ) / ( 2, 2 ) } DATA { (0,0): "a", "b", (1,0): "c", "d" } ATTRIBUTE "DIMENSION_LIST" { DATATYPE H5T_VLEN { H5T_REFERENCE { H5T_STD_REF_OBJECT }} DATASPACE SIMPLE { ( 2 ) / ( 2 ) } DATA { (0): (), () } } ATTRIBUTE "_Netcdf4Coordinates" { DATATYPE H5T_STD_I32LE DATASPACE SIMPLE { ( 2 ) / ( 2 ) } DATA { (0): 0, 1 } } ATTRIBUTE "_Netcdf4Dimid" { DATATYPE H5T_STD_I32LE DATASPACE SCALAR DATA { (0): 0 } } } } ```Both get written out as:
If you use fixed length strings (eg. ```python import xarray as xr Make an xarray with an array of fixed-length stringsdata = np.array([["a", "b"], ["c", "d"]], dtype="|S1") da = xr.DataArray( data=data, dims=["x", "y"], coords={"x": [0, 1], "y": [0, 1]}, ) da.to_netcdf("test.nc", mode='w') Load the xarray back inda_loaded = xr.load_dataarray("test.nc") assert da.dtype == da_loaded.dtype, "Dtypes don't match" ``` Versions``` INSTALLED VERSIONS ------------------ commit: None python: 3.11.0 | packaged by conda-forge | (main, Jan 14 2023, 12:27:40) [GCC 11.3.0] python-bits: 64 OS: Linux OS-release: 5.14.21-150400.24.46-default machine: x86_64 processor: x86_64 byteorder: little LC_ALL: None LANG: de_DE.UTF-8 LOCALE: ('de_DE', 'UTF-8') libhdf5: 1.12.2 libnetcdf: 4.9.1 xarray: 2023.2.0 pandas: 1.5.3 numpy: 1.24.2 scipy: 1.10.1 netCDF4: 1.6.3 pydap: None h5netcdf: 1.1.0 h5py: 3.8.0 Nio: None zarr: None cftime: 1.6.2 nc_time_axis: None PseudoNetCDF: None rasterio: 1.3.6 cfgrib: None iris: None bottleneck: None dask: 2023.3.1 distributed: 2023.3.1 matplotlib: 3.7.1 cartopy: None seaborn: None numbagg: None fsspec: 2023.3.0 cupy: None pint: None sparse: None flox: None numpy_groupies: None setuptools: 67.6.0 pip: 23.0.1 conda: None pytest: None mypy: None IPython: 8.11.0 sphinx: None ``` |
{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
1632718954 |