html_url,issue_url,id,node_id,user,created_at,updated_at,author_association,body,reactions,performed_via_github_app,issue https://github.com/pydata/xarray/issues/7652#issuecomment-1481120920,https://api.github.com/repos/pydata/xarray/issues/7652,1481120920,IC_kwDOAMm_X85YSByY,5821660,2023-03-23T12:36:17Z,2023-03-23T12:36:17Z,MEMBER,"> @kmuehlbauer this is amazing! > > It would be very valuable to add this list of limitations to the documentation: https://docs.xarray.dev/en/stable/user-guide/io.html#netcdf I've added a bit to this over at #7654.","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,1632718954 https://github.com/pydata/xarray/issues/7652#issuecomment-1480030763,https://api.github.com/repos/pydata/xarray/issues/7652,1480030763,IC_kwDOAMm_X85YN3or,6897215,2023-03-22T18:04:29Z,2023-03-22T18:04:29Z,NONE,"@kmuehlbauer, great! I can confirm that both problems are indeed fixed on my end when using `h5netcdf` and the code from your PR :tada:","{""total_count"": 1, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 1, ""eyes"": 0}",,1632718954 https://github.com/pydata/xarray/issues/7652#issuecomment-1479735582,https://api.github.com/repos/pydata/xarray/issues/7652,1479735582,IC_kwDOAMm_X85YMvke,2448579,2023-03-22T15:03:59Z,2023-03-22T15:04:07Z,MEMBER,"@kmuehlbauer this is amazing! It would be very valuable to add this list of limitations to the documentation: https://docs.xarray.dev/en/stable/user-guide/io.html#netcdf","{""total_count"": 1, ""+1"": 1, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,1632718954 https://github.com/pydata/xarray/issues/7652#issuecomment-1479080036,https://api.github.com/repos/pydata/xarray/issues/7652,1479080036,IC_kwDOAMm_X85YKPhk,5821660,2023-03-22T08:12:28Z,2023-03-22T08:58:47Z,MEMBER,"OK, I've finally gotten to the bottom of this, so I'm writing my findings here: **`int64` -> `int32`** This works with `h5netcdf`/`netcdf4`-backends in any case. That's a feature[^1] of `NETCDF3`-format which will be used if `scipy` is installed and `netCDF4` is not installed (in the case of engine=None). It has not notion of `int64` so this is silently cast to `int32` on write. [^1]: https://docs.unidata.ucar.edu/nug/current/md_types.html **` object** This works with the changes applied in #7654 for `h5netcdf`/`netcdf4`-backends in normal cases (writing out as VLEN string). Again, `NETCDF3` format does not have a notion of string so all strings have to be converted to an internal NC_CHAR representation. I'm not sure it will do any good to try to make this one work. My suggestion would be, just use `NETCDF4`-format and one of the capable backends. ","{""total_count"": 1, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 1, ""rocket"": 0, ""eyes"": 0}",,1632718954 https://github.com/pydata/xarray/issues/7652#issuecomment-1479002634,https://api.github.com/repos/pydata/xarray/issues/7652,1479002634,IC_kwDOAMm_X85YJ8oK,5821660,2023-03-22T06:49:42Z,2023-03-22T06:49:42Z,MEMBER,"Great, much appreciated, thanks! Let's iterate over there then.","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,1632718954 https://github.com/pydata/xarray/issues/7652#issuecomment-1478911024,https://api.github.com/repos/pydata/xarray/issues/7652,1478911024,IC_kwDOAMm_X85YJmQw,6897215,2023-03-22T04:43:46Z,2023-03-22T04:44:17Z,NONE,"Thanks a lot @kmuehlbauer! I replied in your PR :smile: > I can confirm that this fixes > - https://github.com/pydata/xarray/issues/7652#issuecomment-1476956975 (`bool` -> `int8`) > > But **not** `int64` -> `int32`, and ` `O` > - https://github.com/pydata/xarray/issues/7652#issuecomment-1476967312 > - https://github.com/pydata/xarray/issues/7652#issue-1632718954","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,1632718954 https://github.com/pydata/xarray/issues/7652#issuecomment-1477606313,https://api.github.com/repos/pydata/xarray/issues/7652,1477606313,IC_kwDOAMm_X85YEnup,5821660,2023-03-21T10:36:43Z,2023-03-21T13:43:04Z,MEMBER,"> A similar problem where saving, loading, saving, loading changes the dtype `bool` -> `int8`: @basnijholt I'd appreciate if you could test #7654 for that particular case. Update: added another commit which handles the vlen string case.","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,1632718954 https://github.com/pydata/xarray/issues/7652#issuecomment-1477467808,https://api.github.com/repos/pydata/xarray/issues/7652,1477467808,IC_kwDOAMm_X85YEF6g,5821660,2023-03-21T08:55:37Z,2023-03-21T10:09:12Z,MEMBER,"> A similar problem where saving, loading, saving, loading changes the dtype `bool` -> `int8`: That's an issue with netcdf file format, too, it has no bool-dtype. XRef: https://github.com/pydata/xarray/issues/1500 ```python data = np.array([True], dtype=bool) with nc.Dataset(""test-bool-netcdf4.nc"", mode=""w"") as ds: ds.createDimension(""x"", size=1) var = ds.createVariable(""da"", data.dtype.str, dimensions=(""x"")) var[:] = data ``` ``` --------------------------------------------------------------------------- TypeError Traceback (most recent call last) Cell In[42], line 4 2 with nc.Dataset(""test-bool-netcdf4.nc"", mode=""w"") as ds: 3 ds.createDimension(""x"", size=1) ----> 4 var = ds.createVariable(""da"", data.dtype.str, dimensions=(""x"")) 5 var[:] = data File src/netCDF4/_netCDF4.pyx:2945, in netCDF4._netCDF4.Dataset.createVariable() File src/netCDF4/_netCDF4.pyx:4121, in netCDF4._netCDF4.Variable.__init__() TypeError: illegal primitive data type, must be one of dict_keys(['S1', 'i1', 'u1', 'i2', 'u2', 'i4', 'u4', 'i8', 'u8', 'f4', 'f8']), got bool ``` Update: Xarray is forwarding the information to the file, by adding a dtype-attribute. It looks like this information is not correctly distributed back to `.encoding` in the case of saving/loading/saving/loading. I'd consider that one a bug. Reason: While decoding the `.encoding`-dtype is set as `original_dtype` (`int8` in our case), but it should either be removed or explicitely set as `bool`. https://github.com/pydata/xarray/blob/f1ff956ff67f3c053a2514d93d35929059e17b07/xarray/conventions.py#L400-L404 ","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,1632718954 https://github.com/pydata/xarray/issues/7652#issuecomment-1477473412,https://api.github.com/repos/pydata/xarray/issues/7652,1477473412,IC_kwDOAMm_X85YEHSE,5821660,2023-03-21T08:58:51Z,2023-03-21T08:58:51Z,MEMBER,"> Another fun one where `int64` -> `int32`: Can't reproduce this one with my environment. See above for details. ","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,1632718954 https://github.com/pydata/xarray/issues/7652#issuecomment-1477447875,https://api.github.com/repos/pydata/xarray/issues/7652,1477447875,IC_kwDOAMm_X85YEBDD,5821660,2023-03-21T08:37:48Z,2023-03-21T08:37:48Z,MEMBER,"@basnijholt For the string issue this is somehwat kind of netcdf/numpy based issue with VLEN types. XRef: https://unidata.github.io/netcdf4-python/#dealing-with-strings > The most flexible way to store arrays of strings is with the [Variable-length (vlen) string data type](https://unidata.github.io/netcdf4-python/#variable-length-vlen-data-type). However, this requires the use of the NETCDF4 data model, and the vlen type does not map very well numpy arrays (you have to use numpy arrays of dtype=object, which are arrays of arbitrary python objects). And numpy will create a VLEN string array if no dtype is given, like in your case. At least netCDF4 and h5netcdf backends are consistent in their writing (creating similar hdf5-files) and reading back (object-dtype):
plain netCDF4 ```python import netCDF4 as nc import numpy as np data = np.array([[""a"", ""b""], [""c"", ""d""]], dtype="" vlen da(x, y) vlen data type: unlimited dimensions: current shape = (2, 2) Contents [['a' 'b'] ['c' 'd']] object Read NC-File Variable vlen da(x, y) vlen data type: unlimited dimensions: current shape = (2, 2) Contents [['a' 'b'] ['c' 'd']] object ``` ```bash netcdf test-plain-netcdf4 { dimensions: x = 2 ; y = 2 ; variables: string da(x, y) ; data: da = ""a"", ""b"", ""c"", ""d"" ; } HDF5 ""test-plain-netcdf4.nc"" { DATASET ""da"" { DATATYPE H5T_STRING { STRSIZE H5T_VARIABLE; STRPAD H5T_STR_NULLTERM; CSET H5T_CSET_UTF8; CTYPE H5T_C_S1; } DATASPACE SIMPLE { ( 2, 2 ) / ( 2, 2 ) } DATA { (0,0): ""a"", ""b"", (1,0): ""c"", ""d"" } ATTRIBUTE ""DIMENSION_LIST"" { DATATYPE H5T_VLEN { H5T_REFERENCE { H5T_STD_REF_OBJECT }} DATASPACE SIMPLE { ( 2 ) / ( 2 ) } DATA { (0): (), () } } ATTRIBUTE ""_Netcdf4Coordinates"" { DATATYPE H5T_STD_I32LE DATASPACE SIMPLE { ( 2 ) / ( 2 ) } DATA { (0): 0, 1 } } } } ```
plain h5netcdf ```python import h5netcdf.legacyapi as h5nc import h5py data = np.array([[""a"", ""b""], [""c"", ""d""]], dtype=""> Attributes: Contents [['a' 'b'] ['c' 'd']] object Read NC-File Variable > Attributes: Contents [['a' 'b'] ['c' 'd']] object ``` ```bash netcdf test-plain-h5netcdf { dimensions: x = 2 ; y = 2 ; variables: string da(x, y) ; data: da = ""a"", ""b"", ""c"", ""d"" ; } HDF5 ""test-plain-h5netcdf.nc"" { DATASET ""da"" { DATATYPE H5T_STRING { STRSIZE H5T_VARIABLE; STRPAD H5T_STR_NULLTERM; CSET H5T_CSET_UTF8; CTYPE H5T_C_S1; } DATASPACE SIMPLE { ( 2, 2 ) / ( 2, 2 ) } DATA { (0,0): ""a"", ""b"", (1,0): ""c"", ""d"" } ATTRIBUTE ""DIMENSION_LIST"" { DATATYPE H5T_VLEN { H5T_REFERENCE { H5T_STD_REF_OBJECT }} DATASPACE SIMPLE { ( 2 ) / ( 2 ) } DATA { (0): (), () } } ATTRIBUTE ""_Netcdf4Coordinates"" { DATATYPE H5T_STD_I32LE DATASPACE SIMPLE { ( 2 ) / ( 2 ) } DATA { (0): 0, 1 } } ATTRIBUTE ""_Netcdf4Dimid"" { DATATYPE H5T_STD_I32LE DATASPACE SCALAR DATA { (0): 0 } } } } ```
Both get written out as: ``` DATATYPE H5T_STRING { STRSIZE H5T_VARIABLE; STRPAD H5T_STR_NULLTERM; CSET H5T_CSET_UTF8; CTYPE H5T_C_S1; } ``` If you use fixed length strings (eg. `|S1`) the dtype is preserved during roundtrip: ```python import xarray as xr # Make an xarray with an array of fixed-length strings data = np.array([[""a"", ""b""], [""c"", ""d""]], dtype=""|S1"") da = xr.DataArray( data=data, dims=[""x"", ""y""], coords={""x"": [0, 1], ""y"": [0, 1]}, ) da.to_netcdf(""test.nc"", mode='w') # Load the xarray back in da_loaded = xr.load_dataarray(""test.nc"") assert da.dtype == da_loaded.dtype, ""Dtypes don't match"" ```
Versions ``` INSTALLED VERSIONS ------------------ commit: None python: 3.11.0 | packaged by conda-forge | (main, Jan 14 2023, 12:27:40) [GCC 11.3.0] python-bits: 64 OS: Linux OS-release: 5.14.21-150400.24.46-default machine: x86_64 processor: x86_64 byteorder: little LC_ALL: None LANG: de_DE.UTF-8 LOCALE: ('de_DE', 'UTF-8') libhdf5: 1.12.2 libnetcdf: 4.9.1 xarray: 2023.2.0 pandas: 1.5.3 numpy: 1.24.2 scipy: 1.10.1 netCDF4: 1.6.3 pydap: None h5netcdf: 1.1.0 h5py: 3.8.0 Nio: None zarr: None cftime: 1.6.2 nc_time_axis: None PseudoNetCDF: None rasterio: 1.3.6 cfgrib: None iris: None bottleneck: None dask: 2023.3.1 distributed: 2023.3.1 matplotlib: 3.7.1 cartopy: None seaborn: None numbagg: None fsspec: 2023.3.0 cupy: None pint: None sparse: None flox: None numpy_groupies: None setuptools: 67.6.0 pip: 23.0.1 conda: None pytest: None mypy: None IPython: 8.11.0 sphinx: None ```
","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,1632718954 https://github.com/pydata/xarray/issues/7652#issuecomment-1476967312,https://api.github.com/repos/pydata/xarray/issues/7652,1476967312,IC_kwDOAMm_X85YCLuQ,6897215,2023-03-20T21:34:30Z,2023-03-20T21:34:30Z,NONE,"Another fun one where `int64` -> `int32`: ```python import xarray as xr da = xr.DataArray(data=[1], dims=[""x""], coords={""x"": [0]}) da.to_netcdf(""test.nc"", mode=""w"") da2 = xr.load_dataarray(""test.nc"") da.dtype, da2.dtype ```","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,1632718954 https://github.com/pydata/xarray/issues/7652#issuecomment-1476956975,https://api.github.com/repos/pydata/xarray/issues/7652,1476956975,IC_kwDOAMm_X85YCJMv,6897215,2023-03-20T21:25:59Z,2023-03-20T21:34:24Z,NONE,"A similar problem where saving, loading, saving, loading changes the dtype `bool` -> `int8`: ```python import xarray as xr da1 = xr.DataArray(data=[True], dims=[""x""], coords={""x"": [0]}) da1.to_netcdf(""test.nc"", mode=""w"") da2 = xr.load_dataarray(""test.nc"") da2.to_netcdf(""test.nc"", mode=""w"") da3 = xr.load_dataarray(""test.nc"") assert da1.dtype == da3.dtype, ""Dtypes don't match"" ```","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,1632718954 https://github.com/pydata/xarray/issues/7652#issuecomment-1476826827,https://api.github.com/repos/pydata/xarray/issues/7652,1476826827,IC_kwDOAMm_X85YBpbL,6654709,2023-03-20T19:37:02Z,2023-03-20T19:37:40Z,NONE,"It seems that the ""string"" information is stored in the `encoding` of the loaded dataset. set in this block: https://github.com/pydata/xarray/blob/f1ff956ff67f3c053a2514d93d35929059e17b07/xarray/backends/h5netcdf_.py#L208-L210 but this encoding is not ""applied"" to the dtype of the dataset's variable.","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,1632718954