home / github / issue_comments

Menu
  • GraphQL API
  • Search all tables

issue_comments: 379294800

This data as json

html_url issue_url id node_id user created_at updated_at author_association body reactions performed_via_github_app issue
https://github.com/pydata/xarray/issues/2040#issuecomment-379294800 https://api.github.com/repos/pydata/xarray/issues/2040 379294800 MDEyOklzc3VlQ29tbWVudDM3OTI5NDgwMA== 1217238 2018-04-06T15:47:24Z 2018-04-06T15:47:24Z MEMBER

The main reason for preferring variable length strings was that netCDF4-python always properly decoded them as unicode strings, even on Python 3. Basically, it was required to properly round-trip strings to a netCDF file on Python 3.

However, this is no longer the case, now that we specify an encoding when writing fixed length strings (https://github.com/pydata/xarray/pull/1648). So we could potentially revisit the default behavior.

I'll admit I'm also a little surprised by how large the storage overhead turns out to be for variable length datatypes. The HDF5 docs claim it's 32 bytes per element, which would be about 10 MB or so for your dataset. And apparently it interacts poorly with compression, too.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  311578894
Powered by Datasette · Queries took 0.65ms · About: xarray-datasette