home / github / issue_comments

Menu
  • GraphQL API
  • Search all tables

issue_comments: 303541907

This data as json

html_url issue_url id node_id user created_at updated_at author_association body reactions performed_via_github_app issue
https://github.com/pydata/xarray/pull/1421#issuecomment-303541907 https://api.github.com/repos/pydata/xarray/issues/1421 303541907 MDEyOklzc3VlQ29tbWVudDMwMzU0MTkwNw== 1217238 2017-05-23T21:46:39Z 2017-05-23T21:48:18Z MEMBER

Thanks for giving this a shot!

I added allow_object kwarg (rather than allow_pickle, no reason to firmly attach pickle to the api, could use something else for other backends).

I'm having a hard time imagining any other serialization formats for serializing arbitrary Python objects. pickle is pretty standard, though we might switch the argument for to_netcdf to pickle_protocol to allow indicating the pickle version (which would default to None, for don't pickle).

One addition reason for favoring allow_pickle is that it's the argument used by np.save and np.load.

NetCDF4DataStore handles this independently from the cf_encoder/decoder. The dtype support made it hard to decouple, plus I think object serialization is a backend dependent issue.

Yes, this is a little tricky. The current design is not great here. Ideally, though, we would still keep all of the encoding/decoding logic separate from the datastores. I need to think about this a little more.


One other concern is how to represent this data on disk in netCDF/HDF5 variables. Ideally, we would have a format that could work -- at least in principle -- with h5netcdf/h5py as well as netCDF4-python.

Annoyingly, these libraries currently have incompatible dtype support:

  • netCDF4-python supports variable length types with custom name. It does not support HDF5's opaque type, though the netCDF-C libraries do support opaque types, so adding them to netCDF4-Python would be relatively straightforward.
  • h5py supports variable length types, but not their name field. It maps np.void to HDF5's opaque type, which would be a pretty sensible storage type if netCDF4-Python supported it.

So if we want something that works with both, we'll need to add some additional metadata field in the form of an attribute to indicate how do decoding. Maybe something like _PickleProtocol, which would store the version of the pickle protocol used to write the data?


I have some inline comments I'll add below.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  230566456
Powered by Datasette · Queries took 158.529ms · About: xarray-datasette