html_url,issue_url,id,node_id,user,created_at,updated_at,author_association,body,reactions,performed_via_github_app,issue
https://github.com/pydata/xarray/issues/7652#issuecomment-1481120920,https://api.github.com/repos/pydata/xarray/issues/7652,1481120920,IC_kwDOAMm_X85YSByY,5821660,2023-03-23T12:36:17Z,2023-03-23T12:36:17Z,MEMBER,"> @kmuehlbauer this is amazing!
>
> It would be very valuable to add this list of limitations to the documentation: https://docs.xarray.dev/en/stable/user-guide/io.html#netcdf
I've added a bit to this over at #7654.","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,1632718954
https://github.com/pydata/xarray/issues/7652#issuecomment-1480030763,https://api.github.com/repos/pydata/xarray/issues/7652,1480030763,IC_kwDOAMm_X85YN3or,6897215,2023-03-22T18:04:29Z,2023-03-22T18:04:29Z,NONE,"@kmuehlbauer, great!
I can confirm that both problems are indeed fixed on my end when using `h5netcdf` and the code from your PR :tada:","{""total_count"": 1, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 1, ""eyes"": 0}",,1632718954
https://github.com/pydata/xarray/issues/7652#issuecomment-1479735582,https://api.github.com/repos/pydata/xarray/issues/7652,1479735582,IC_kwDOAMm_X85YMvke,2448579,2023-03-22T15:03:59Z,2023-03-22T15:04:07Z,MEMBER,"@kmuehlbauer this is amazing!
It would be very valuable to add this list of limitations to the documentation: https://docs.xarray.dev/en/stable/user-guide/io.html#netcdf","{""total_count"": 1, ""+1"": 1, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,1632718954
https://github.com/pydata/xarray/issues/7652#issuecomment-1479080036,https://api.github.com/repos/pydata/xarray/issues/7652,1479080036,IC_kwDOAMm_X85YKPhk,5821660,2023-03-22T08:12:28Z,2023-03-22T08:58:47Z,MEMBER,"OK, I've finally gotten to the bottom of this, so I'm writing my findings here:
**`int64` -> `int32`**
This works with `h5netcdf`/`netcdf4`-backends in any case. That's a feature[^1] of `NETCDF3`-format which will be used if `scipy` is installed and `netCDF4` is not installed (in the case of engine=None). It has not notion of `int64` so this is silently cast to `int32` on write.
[^1]: https://docs.unidata.ucar.edu/nug/current/md_types.html
**` object**
This works with the changes applied in #7654 for `h5netcdf`/`netcdf4`-backends in normal cases (writing out as VLEN string). Again, `NETCDF3` format does not have a notion of string so all strings have to be converted to an internal NC_CHAR representation. I'm not sure it will do any good to try to make this one work.
My suggestion would be, just use `NETCDF4`-format and one of the capable backends. ","{""total_count"": 1, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 1, ""rocket"": 0, ""eyes"": 0}",,1632718954
https://github.com/pydata/xarray/issues/7652#issuecomment-1479002634,https://api.github.com/repos/pydata/xarray/issues/7652,1479002634,IC_kwDOAMm_X85YJ8oK,5821660,2023-03-22T06:49:42Z,2023-03-22T06:49:42Z,MEMBER,"Great, much appreciated, thanks! Let's iterate over there then.","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,1632718954
https://github.com/pydata/xarray/issues/7652#issuecomment-1478911024,https://api.github.com/repos/pydata/xarray/issues/7652,1478911024,IC_kwDOAMm_X85YJmQw,6897215,2023-03-22T04:43:46Z,2023-03-22T04:44:17Z,NONE,"Thanks a lot @kmuehlbauer!
I replied in your PR :smile:
> I can confirm that this fixes
> - https://github.com/pydata/xarray/issues/7652#issuecomment-1476956975 (`bool` -> `int8`)
>
> But **not** `int64` -> `int32`, and ` `O`
> - https://github.com/pydata/xarray/issues/7652#issuecomment-1476967312
> - https://github.com/pydata/xarray/issues/7652#issue-1632718954","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,1632718954
https://github.com/pydata/xarray/issues/7652#issuecomment-1477606313,https://api.github.com/repos/pydata/xarray/issues/7652,1477606313,IC_kwDOAMm_X85YEnup,5821660,2023-03-21T10:36:43Z,2023-03-21T13:43:04Z,MEMBER,"> A similar problem where saving, loading, saving, loading changes the dtype `bool` -> `int8`:
@basnijholt I'd appreciate if you could test #7654 for that particular case.
Update: added another commit which handles the vlen string case.","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,1632718954
https://github.com/pydata/xarray/issues/7652#issuecomment-1477467808,https://api.github.com/repos/pydata/xarray/issues/7652,1477467808,IC_kwDOAMm_X85YEF6g,5821660,2023-03-21T08:55:37Z,2023-03-21T10:09:12Z,MEMBER,"> A similar problem where saving, loading, saving, loading changes the dtype `bool` -> `int8`:
That's an issue with netcdf file format, too, it has no bool-dtype.
XRef: https://github.com/pydata/xarray/issues/1500
```python
data = np.array([True], dtype=bool)
with nc.Dataset(""test-bool-netcdf4.nc"", mode=""w"") as ds:
ds.createDimension(""x"", size=1)
var = ds.createVariable(""da"", data.dtype.str, dimensions=(""x""))
var[:] = data
```
```
---------------------------------------------------------------------------
TypeError Traceback (most recent call last)
Cell In[42], line 4
2 with nc.Dataset(""test-bool-netcdf4.nc"", mode=""w"") as ds:
3 ds.createDimension(""x"", size=1)
----> 4 var = ds.createVariable(""da"", data.dtype.str, dimensions=(""x""))
5 var[:] = data
File src/netCDF4/_netCDF4.pyx:2945, in netCDF4._netCDF4.Dataset.createVariable()
File src/netCDF4/_netCDF4.pyx:4121, in netCDF4._netCDF4.Variable.__init__()
TypeError: illegal primitive data type, must be one of dict_keys(['S1', 'i1', 'u1', 'i2', 'u2', 'i4', 'u4', 'i8', 'u8', 'f4', 'f8']), got bool
```
Update:
Xarray is forwarding the information to the file, by adding a dtype-attribute. It looks like this information is not correctly distributed back to `.encoding` in the case of saving/loading/saving/loading. I'd consider that one a bug.
Reason:
While decoding the `.encoding`-dtype is set as `original_dtype` (`int8` in our case), but it should either be removed or explicitely set as `bool`.
https://github.com/pydata/xarray/blob/f1ff956ff67f3c053a2514d93d35929059e17b07/xarray/conventions.py#L400-L404
","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,1632718954
https://github.com/pydata/xarray/issues/7652#issuecomment-1477473412,https://api.github.com/repos/pydata/xarray/issues/7652,1477473412,IC_kwDOAMm_X85YEHSE,5821660,2023-03-21T08:58:51Z,2023-03-21T08:58:51Z,MEMBER,"> Another fun one where `int64` -> `int32`:
Can't reproduce this one with my environment. See above for details.
","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,1632718954
https://github.com/pydata/xarray/issues/7652#issuecomment-1477447875,https://api.github.com/repos/pydata/xarray/issues/7652,1477447875,IC_kwDOAMm_X85YEBDD,5821660,2023-03-21T08:37:48Z,2023-03-21T08:37:48Z,MEMBER,"@basnijholt For the string issue this is somehwat kind of netcdf/numpy based issue with VLEN types.
XRef: https://unidata.github.io/netcdf4-python/#dealing-with-strings
> The most flexible way to store arrays of strings is with the [Variable-length (vlen) string data type](https://unidata.github.io/netcdf4-python/#variable-length-vlen-data-type). However, this requires the use of the NETCDF4 data model, and the vlen type does not map very well numpy arrays (you have to use numpy arrays of dtype=object, which are arrays of arbitrary python objects).
And numpy will create a VLEN string array if no dtype is given, like in your case.
At least netCDF4 and h5netcdf backends are consistent in their writing (creating similar hdf5-files) and reading back (object-dtype):
plain netCDF4
```python
import netCDF4 as nc
import numpy as np
data = np.array([[""a"", ""b""], [""c"", ""d""]], dtype=""
vlen da(x, y)
vlen data type:
unlimited dimensions:
current shape = (2, 2)
Contents
[['a' 'b']
['c' 'd']]
object
Read NC-File
Variable
vlen da(x, y)
vlen data type:
unlimited dimensions:
current shape = (2, 2)
Contents
[['a' 'b']
['c' 'd']]
object
```
```bash
netcdf test-plain-netcdf4 {
dimensions:
x = 2 ;
y = 2 ;
variables:
string da(x, y) ;
data:
da =
""a"", ""b"",
""c"", ""d"" ;
}
HDF5 ""test-plain-netcdf4.nc"" {
DATASET ""da"" {
DATATYPE H5T_STRING {
STRSIZE H5T_VARIABLE;
STRPAD H5T_STR_NULLTERM;
CSET H5T_CSET_UTF8;
CTYPE H5T_C_S1;
}
DATASPACE SIMPLE { ( 2, 2 ) / ( 2, 2 ) }
DATA {
(0,0): ""a"", ""b"",
(1,0): ""c"", ""d""
}
ATTRIBUTE ""DIMENSION_LIST"" {
DATATYPE H5T_VLEN { H5T_REFERENCE { H5T_STD_REF_OBJECT }}
DATASPACE SIMPLE { ( 2 ) / ( 2 ) }
DATA {
(0): (), ()
}
}
ATTRIBUTE ""_Netcdf4Coordinates"" {
DATATYPE H5T_STD_I32LE
DATASPACE SIMPLE { ( 2 ) / ( 2 ) }
DATA {
(0): 0, 1
}
}
}
}
```
plain h5netcdf
```python
import h5netcdf.legacyapi as h5nc
import h5py
data = np.array([[""a"", ""b""], [""c"", ""d""]], dtype="">
Attributes:
Contents
[['a' 'b']
['c' 'd']]
object
Read NC-File
Variable
>
Attributes:
Contents
[['a' 'b']
['c' 'd']]
object
```
```bash
netcdf test-plain-h5netcdf {
dimensions:
x = 2 ;
y = 2 ;
variables:
string da(x, y) ;
data:
da =
""a"", ""b"",
""c"", ""d"" ;
}
HDF5 ""test-plain-h5netcdf.nc"" {
DATASET ""da"" {
DATATYPE H5T_STRING {
STRSIZE H5T_VARIABLE;
STRPAD H5T_STR_NULLTERM;
CSET H5T_CSET_UTF8;
CTYPE H5T_C_S1;
}
DATASPACE SIMPLE { ( 2, 2 ) / ( 2, 2 ) }
DATA {
(0,0): ""a"", ""b"",
(1,0): ""c"", ""d""
}
ATTRIBUTE ""DIMENSION_LIST"" {
DATATYPE H5T_VLEN { H5T_REFERENCE { H5T_STD_REF_OBJECT }}
DATASPACE SIMPLE { ( 2 ) / ( 2 ) }
DATA {
(0): (), ()
}
}
ATTRIBUTE ""_Netcdf4Coordinates"" {
DATATYPE H5T_STD_I32LE
DATASPACE SIMPLE { ( 2 ) / ( 2 ) }
DATA {
(0): 0, 1
}
}
ATTRIBUTE ""_Netcdf4Dimid"" {
DATATYPE H5T_STD_I32LE
DATASPACE SCALAR
DATA {
(0): 0
}
}
}
}
```
Both get written out as:
```
DATATYPE H5T_STRING {
STRSIZE H5T_VARIABLE;
STRPAD H5T_STR_NULLTERM;
CSET H5T_CSET_UTF8;
CTYPE H5T_C_S1;
}
```
If you use fixed length strings (eg. `|S1`) the dtype is preserved during roundtrip:
```python
import xarray as xr
# Make an xarray with an array of fixed-length strings
data = np.array([[""a"", ""b""], [""c"", ""d""]], dtype=""|S1"")
da = xr.DataArray(
data=data,
dims=[""x"", ""y""],
coords={""x"": [0, 1], ""y"": [0, 1]},
)
da.to_netcdf(""test.nc"", mode='w')
# Load the xarray back in
da_loaded = xr.load_dataarray(""test.nc"")
assert da.dtype == da_loaded.dtype, ""Dtypes don't match""
```
Versions
```
INSTALLED VERSIONS
------------------
commit: None
python: 3.11.0 | packaged by conda-forge | (main, Jan 14 2023, 12:27:40) [GCC 11.3.0]
python-bits: 64
OS: Linux
OS-release: 5.14.21-150400.24.46-default
machine: x86_64
processor: x86_64
byteorder: little
LC_ALL: None
LANG: de_DE.UTF-8
LOCALE: ('de_DE', 'UTF-8')
libhdf5: 1.12.2
libnetcdf: 4.9.1
xarray: 2023.2.0
pandas: 1.5.3
numpy: 1.24.2
scipy: 1.10.1
netCDF4: 1.6.3
pydap: None
h5netcdf: 1.1.0
h5py: 3.8.0
Nio: None
zarr: None
cftime: 1.6.2
nc_time_axis: None
PseudoNetCDF: None
rasterio: 1.3.6
cfgrib: None
iris: None
bottleneck: None
dask: 2023.3.1
distributed: 2023.3.1
matplotlib: 3.7.1
cartopy: None
seaborn: None
numbagg: None
fsspec: 2023.3.0
cupy: None
pint: None
sparse: None
flox: None
numpy_groupies: None
setuptools: 67.6.0
pip: 23.0.1
conda: None
pytest: None
mypy: None
IPython: 8.11.0
sphinx: None
```
","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,1632718954
https://github.com/pydata/xarray/issues/7652#issuecomment-1476967312,https://api.github.com/repos/pydata/xarray/issues/7652,1476967312,IC_kwDOAMm_X85YCLuQ,6897215,2023-03-20T21:34:30Z,2023-03-20T21:34:30Z,NONE,"Another fun one where `int64` -> `int32`:
```python
import xarray as xr
da = xr.DataArray(data=[1], dims=[""x""], coords={""x"": [0]})
da.to_netcdf(""test.nc"", mode=""w"")
da2 = xr.load_dataarray(""test.nc"")
da.dtype, da2.dtype
```","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,1632718954
https://github.com/pydata/xarray/issues/7652#issuecomment-1476956975,https://api.github.com/repos/pydata/xarray/issues/7652,1476956975,IC_kwDOAMm_X85YCJMv,6897215,2023-03-20T21:25:59Z,2023-03-20T21:34:24Z,NONE,"A similar problem where saving, loading, saving, loading changes the dtype `bool` -> `int8`:
```python
import xarray as xr
da1 = xr.DataArray(data=[True], dims=[""x""], coords={""x"": [0]})
da1.to_netcdf(""test.nc"", mode=""w"")
da2 = xr.load_dataarray(""test.nc"")
da2.to_netcdf(""test.nc"", mode=""w"")
da3 = xr.load_dataarray(""test.nc"")
assert da1.dtype == da3.dtype, ""Dtypes don't match""
```","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,1632718954
https://github.com/pydata/xarray/issues/7652#issuecomment-1476826827,https://api.github.com/repos/pydata/xarray/issues/7652,1476826827,IC_kwDOAMm_X85YBpbL,6654709,2023-03-20T19:37:02Z,2023-03-20T19:37:40Z,NONE,"It seems that the ""string"" information is stored in the `encoding` of the loaded dataset. set in this block:
https://github.com/pydata/xarray/blob/f1ff956ff67f3c053a2514d93d35929059e17b07/xarray/backends/h5netcdf_.py#L208-L210
but this encoding is not ""applied"" to the dtype of the dataset's variable.","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,1632718954