html_url,issue_url,id,node_id,user,created_at,updated_at,author_association,body,reactions,performed_via_github_app,issue
https://github.com/pydata/xarray/issues/2059#issuecomment-730267306,https://api.github.com/repos/pydata/xarray/issues/2059,730267306,MDEyOklzc3VlQ29tbWVudDczMDI2NzMwNg==,5821660,2020-11-19T10:08:16Z,2020-11-19T10:08:16Z,MEMBER,@NowanIlfideme h5py 3 changes with regard to strings is tracked also in #4570,"{""total_count"": 1, ""+1"": 1, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,314444743
https://github.com/pydata/xarray/issues/2059#issuecomment-730263703,https://api.github.com/repos/pydata/xarray/issues/2059,730263703,MDEyOklzc3VlQ29tbWVudDczMDI2MzcwMw==,2067093,2020-11-19T10:02:35Z,2020-11-19T10:02:35Z,NONE,"This may be relevant here, maybe not, but it appears the HDF5 backend is also at odds with all the above serialization.
Our internal project's dependencies changed, and that moved the `h5py` version from 2.10 to 3.1; apparently there was a breaking change that meant unicode strings were either encoded or decoded as `bytes`. Thankfully we had a test for that, but figuring out what was wrong was difficult.
Essentially, netCDF4 files that were round-tripped to a BytesIO (via an HDF5 backend) had unicode strings converted to bytes. I'm not sure whether it was the encoding or decoding part, likely decoding, judging by the docs:
https://docs.h5py.org/en/stable/strings.html
https://docs.h5py.org/en/stable/whatsnew/3.0.html#breaking-changes-deprecations
This might require even more special-casing to achieve consistent behavior for xarray users who don't really want to go into backend details (like me 😋).","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,314444743
https://github.com/pydata/xarray/issues/2059#issuecomment-412319738,https://api.github.com/repos/pydata/xarray/issues/2059,412319738,MDEyOklzc3VlQ29tbWVudDQxMjMxOTczOA==,1217238,2018-08-12T05:27:10Z,2018-08-12T05:27:10Z,MEMBER,"> Is it possible to preserve dtype when persisting xarray Datasets/DataArrays to disk?
Unfortunately, there is a frustrating disconnect between string data types in NumPy and netCDF.
This could be done in principle, but it would require adding our xarray specific convention on top of netCDF. I'm not sure this would be worth it -- we already end up converting np.unicode_ to object dtypes in many operations because we need a string dtype that can support missing values.
For reading data from disk, we use object dtype because we don't know the length of the longest string until we actually read the data, so this would be incompatible with lazy loading.","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,314444743
https://github.com/pydata/xarray/issues/2059#issuecomment-412095066,https://api.github.com/repos/pydata/xarray/issues/2059,412095066,MDEyOklzc3VlQ29tbWVudDQxMjA5NTA2Ng==,32801740,2018-08-10T14:14:12Z,2018-08-10T14:14:12Z,CONTRIBUTOR,"Currently, the dtype does not seem to roundtrip faithfully.
When I write `np.unicode_ / str` to file, it gets transformed to `object` when I subsequently read it from disk. I am using xarray 0.10.8 with Python 3 on Windows.
This can be reproduced by inserting the following line in the script above (and adjusting the print statement accordingly)
```python
with xr.open_dataset(filename) as ds:
read_dtype = ds['data'].dtype
```
which gives:
Python version | NetCDF version | NumPy datatype | NetCDF datatype|Numpy datatype (read)
-- | -- | -- | --|--
| Python 3 | NETCDF3 | np.string_ / bytes | NC_CHAR | \|S3 |
| Python 3 | NETCDF4 | np.string_ / bytes | NC_CHAR | \|S3 |
| Python 3 | NETCDF3 | np.unicode_ / str | NC_CHAR with UTF-8 encoding | object |
| Python 3 | NETCDF4 | np.unicode_ / str | NC_STRING | object |
| Python 3 | NETCDF3 | object bytes/bytes | NC_CHAR | \|S3 |
| Python 3 | NETCDF4 | object bytes/bytes | NC_CHAR | \|S3 |
| Python 3 | NETCDF3 | object unicode/str | NC_CHAR with UTF-8 encoding | object |
| Python 3 | NETCDF4 | object unicode/str | NC_STRING | object |
Also `object bytes/bytes` seems not to roundtrip nicely as it seems to be converted to `np.string_ / bytes`.
Is it possible to preserve dtype when persisting xarray Datasets/DataArrays to disk?","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,314444743
https://github.com/pydata/xarray/issues/2059#issuecomment-381620236,https://api.github.com/repos/pydata/xarray/issues/2059,381620236,MDEyOklzc3VlQ29tbWVudDM4MTYyMDIzNg==,10050469,2018-04-16T14:33:20Z,2018-04-16T14:33:20Z,MEMBER,"Thanks a lot Stephan for writing that up!
> The counter-argument is that it may not be worth changing this at this late point, given that we will be sunsetting Python 2 support by year's end.
This would be my personal opinion here. I you feel like this is something you'd like to provide before the last py2-compatible xarray comes out than I'm fine with it, but it shouldn't have top-priority... ","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,314444743