html_url,issue_url,id,node_id,user,created_at,updated_at,author_association,body,reactions,performed_via_github_app,issue
https://github.com/pydata/xarray/issues/53#issuecomment-42261949,https://api.github.com/repos/pydata/xarray/issues/53,42261949,MDEyOklzc3VlQ29tbWVudDQyMjYxOTQ5,1217238,2014-05-06T02:37:31Z,2014-05-06T02:37:31Z,MEMBER,"A bit more context: NetCDF3 (as a file format), which is all that scipy supports, doesn't support Unicode or 64 bit numbers. It really is a relic.
On Mon, May 5, 2014 at 6:17 PM, Thomas Kluyver notifications@github.com
wrote:
> Outstanding issue: What to do with unicode data (the native str type on Python 3 is unicode). The SciPy netcdf module doesn't attempt to handle unicode, so I wrote some code to encode unicode to bytes before storing it in scipy. That means, however, that the roundtrip doesn't work, because loading the data again gives bytes, not str.
> The netCDF4 module appears to handle this by just decoding any string data it finds to unicode, without considering what it's intended to be.
> Options:
> - Decode all string data to unicode on load. This is consistent with what netCDF already does.
> - Store the fact that the data was encoded in an attribute in the file. If that attribute is present, decode it again on load.
> - Don't decode any data, and change all the scipy roundtrip tests to expect that data comes back as bytes.
> I'm not familiar enough with netCDF and how it's used to know what makes sense here.
> ---
> Reply to this email directly or view it on GitHub:
> https://github.com/akleeman/xray/issues/53#issuecomment-42258925
","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,28940534
https://github.com/pydata/xarray/issues/53#issuecomment-42261453,https://api.github.com/repos/pydata/xarray/issues/53,42261453,MDEyOklzc3VlQ29tbWVudDQyMjYxNDUz,1217238,2014-05-06T02:22:48Z,2014-05-06T02:22:48Z,MEMBER,"First of all, thank you for tackling this!
Your first suggestion (decoding all string data to unicode) sounds like the right choice to me. Ideally this can be done in a lazy fashion (without needing to load all array data from disk when opening a file), but honestly I'm not too concerned about NetCDF3 performance for partially loading files from disk with SciPy library, given that NetCDF3 are already limited to be smaller than 2GB.
Let me give you a little bit of context:
The SciPy NetCDF module only works with an obsolete file format (NetCDF3; the current version, based on HDF5, is NetCDF4). The main reason we support it is because it serves as a (somewhat non-ideal) wire format, because SciPy can read and write file-like objects without files actually existing on disk, which is not possible with the NetCDF4 library.
","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,28940534
https://github.com/pydata/xarray/issues/53#issuecomment-42258925,https://api.github.com/repos/pydata/xarray/issues/53,42258925,MDEyOklzc3VlQ29tbWVudDQyMjU4OTI1,327925,2014-05-06T01:17:16Z,2014-05-06T01:17:16Z,MEMBER,"Outstanding issue: What to do with unicode data (the native str type on Python 3 is unicode). The SciPy netcdf module doesn't attempt to handle unicode, so I wrote some code to encode unicode to bytes before storing it in scipy. That means, however, that the roundtrip doesn't work, because loading the data again gives bytes, not str.
The netCDF4 module appears to handle this by just decoding any string data it finds to unicode, without considering what it's intended to be.
Options:
- Decode all string data to unicode on load. This is consistent with what netCDF already does.
- Store the fact that the data was encoded in an attribute in the file. If that attribute is present, decode it again on load.
- Don't decode any data, and change all the scipy roundtrip tests to expect that data comes back as bytes.
I'm not familiar enough with netCDF and how it's used to know what makes sense here.
","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,28940534
https://github.com/pydata/xarray/issues/53#issuecomment-42229648,https://api.github.com/repos/pydata/xarray/issues/53,42229648,MDEyOklzc3VlQ29tbWVudDQyMjI5NjQ4,327925,2014-05-05T19:43:42Z,2014-05-05T19:43:42Z,MEMBER,"Work in progress: https://github.com/takluyver/xray/tree/py3
","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,28940534