home / github

Menu
  • GraphQL API
  • Search all tables

issue_comments

Table actions
  • GraphQL API for issue_comments

7 rows where user = 2941720 sorted by updated_at descending

✎ View and edit SQL

This data as json, CSV (advanced)

Suggested facets: issue_url, created_at (date), updated_at (date)

issue 2

  • Adding arbitrary object serialization 5
  • Save arbitrary Python objects to netCDF 2

user 1

  • lewisacidic · 7 ✖

author_association 1

  • CONTRIBUTOR 7
id html_url issue_url node_id user created_at updated_at ▲ author_association body reactions performed_via_github_app issue
319958229 https://github.com/pydata/xarray/pull/1421#issuecomment-319958229 https://api.github.com/repos/pydata/xarray/issues/1421 MDEyOklzc3VlQ29tbWVudDMxOTk1ODIyOQ== lewisacidic 2941720 2017-08-03T12:44:16Z 2017-08-03T12:44:16Z CONTRIBUTOR

Hi all, sorry for the lack of communication on this. I'm writing up my PhD thesis at the moment, and my deadline is increasingly looming. Once I've handed in I'll finish this off, and am more than happy to help with several other things I've encountered (I'll create the issues now). Once again, sorry for the delay!

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Adding arbitrary object serialization 230566456
310178801 https://github.com/pydata/xarray/pull/1421#issuecomment-310178801 https://api.github.com/repos/pydata/xarray/issues/1421 MDEyOklzc3VlQ29tbWVudDMxMDE3ODgwMQ== lewisacidic 2941720 2017-06-21T19:19:30Z 2017-06-21T19:19:30Z CONTRIBUTOR

Sorry for the holdup. I made progress 3 weekends ago, but didn't get a fully working example, and I've been swamped the whole of this month. This weekend I'll be free to finish this off.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Adding arbitrary object serialization 230566456
303984724 https://github.com/pydata/xarray/pull/1421#issuecomment-303984724 https://api.github.com/repos/pydata/xarray/issues/1421 MDEyOklzc3VlQ29tbWVudDMwMzk4NDcyNA== lewisacidic 2941720 2017-05-25T11:07:36Z 2017-05-25T11:07:36Z CONTRIBUTOR

Sounds great, thanks for the feedback! I'll probably not get to this until the weekend, but I'll have another crack at it then.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Adding arbitrary object serialization 230566456
303580887 https://github.com/pydata/xarray/pull/1421#issuecomment-303580887 https://api.github.com/repos/pydata/xarray/issues/1421 MDEyOklzc3VlQ29tbWVudDMwMzU4MDg4Nw== lewisacidic 2941720 2017-05-24T00:35:25Z 2017-05-24T00:38:37Z CONTRIBUTOR

The pickle protocol is the second byte of any pickle btw, just had a look.

For protocol 2+ anyway...

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Adding arbitrary object serialization 230566456
303580375 https://github.com/pydata/xarray/pull/1421#issuecomment-303580375 https://api.github.com/repos/pydata/xarray/issues/1421 MDEyOklzc3VlQ29tbWVudDMwMzU4MDM3NQ== lewisacidic 2941720 2017-05-24T00:31:31Z 2017-05-24T00:31:31Z CONTRIBUTOR

Thanks for giving this a shot!

No problem, I really want this feature so I can use xarray for a cheminformatics library I'm working on! Hopefully we can work out the best way to do this whilst keeping everything as nice and organised as it is was before I touched the code...

I'm having a hard time imagining any other serialization formats for serializing arbitrary Python objects. pickle is pretty standard, though we might switch the argument for to_netcdf to pickle_protocol to allow indicating the pickle version (which would default to None, for don't pickle).

I couldn't think of others either - it made sense as a keyword for cf_encoder where no pickling currently happens which is why I changed it originally. allow_pickle definitely makes sense, I've used np.save loads but didn't know that keyword existed. Perhaps two kws, allow_pickle and pickle_protocol, so its more explicit, and so we can set a default protocol?

Yes, this is a little tricky. The current design is not great here. Ideally, though, we would still keep all of the encoding/decoding logic separate from the datastores. I need to think about this a little more.

Yeah, it definitely isn't great, I wanted a working example and that's what I managed to do before sleeping! I'll keep looking through the code to familiarize myself a bit more with it - I would be interested to see what you suggest!


One other concern is how to represent this data on disk in netCDF/HDF5 variables. Ideally, we would have a format that could work -- at least in principle -- with h5netcdf/h5py as well as netCDF4-python.

Yeah, that would definitely be good.

Annoyingly, these libraries currently have incompatible dtype support:

  • netCDF4-python supports variable length types with custom name. It does not support HDF5's opaque type, though the netCDF-C libraries do support opaque types, so adding them to netCDF4-Python would be relatively straightforward.
  • h5py supports variable length types, but not their name field. It maps np.void to HDF5's opaque type, which would be a pretty sensible storage type if netCDF4-Python supported it. So if we want something that works with both, we'll need to add some additional metadata field in the form of an attribute to indicate how do decoding. Maybe something like _PickleProtocol, which would store the version of the pickle protocol used to write the data?

The np.void type was pretty much designed for this sort of thing from what I can see, I was pretty surprised that netCDF4-python didn't have something similar, hence the strange np.uint8 stuff. We could do the same for h5py, using h5py.special_dtype, although as you say there is no name so it can't work the same for h5py as it does now. Pickle works out the protocol automatically (no protocol keyword for load or loads), so we wouldn't really need to save the protocol as an attribute, although it would be a way to work out which variables to unpickle once saved, if we went this route. It seems a shame not to use np.void though, so perhaps it makes sense to add the opaque types to netCDF4-python and forget the np.uint8 trick.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Adding arbitrary object serialization 230566456
302909333 https://github.com/pydata/xarray/issues/1415#issuecomment-302909333 https://api.github.com/repos/pydata/xarray/issues/1415 MDEyOklzc3VlQ29tbWVudDMwMjkwOTMzMw== lewisacidic 2941720 2017-05-21T01:36:06Z 2017-05-21T01:36:06Z CONTRIBUTOR

Yeah, looking at it, it's probably not a thing for them. I thought something like:

```python

implement something like

strs = nc.createVariable('strs', str, ('strs_dim',))

objs = nc.createVariable('objs', object, ('objs_dim',)) ```

But I see that the str datatype is a netCDF spec type.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Save arbitrary Python objects to netCDF 230158616
302896115 https://github.com/pydata/xarray/issues/1415#issuecomment-302896115 https://api.github.com/repos/pydata/xarray/issues/1415 MDEyOklzc3VlQ29tbWVudDMwMjg5NjExNQ== lewisacidic 2941720 2017-05-20T20:15:39Z 2017-05-20T20:15:39Z CONTRIBUTOR

I would certainly be interested in giving this a try, although I'm not exactly sure what would go where yet. It seems like this might possibly be something that would be more appropriate in the netCDF4-python library - should I start an issue over there?

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Save arbitrary Python objects to netCDF 230158616

Advanced export

JSON shape: default, array, newline-delimited, object

CSV options:

CREATE TABLE [issue_comments] (
   [html_url] TEXT,
   [issue_url] TEXT,
   [id] INTEGER PRIMARY KEY,
   [node_id] TEXT,
   [user] INTEGER REFERENCES [users]([id]),
   [created_at] TEXT,
   [updated_at] TEXT,
   [author_association] TEXT,
   [body] TEXT,
   [reactions] TEXT,
   [performed_via_github_app] TEXT,
   [issue] INTEGER REFERENCES [issues]([id])
);
CREATE INDEX [idx_issue_comments_issue]
    ON [issue_comments] ([issue]);
CREATE INDEX [idx_issue_comments_user]
    ON [issue_comments] ([user]);
Powered by Datasette · Queries took 2231.192ms · About: xarray-datasette