github: issue_comments: 8 rows where issue = 230566456 sorted by updated

8 rows where issue = 230566456 sorted by updated_at descending

Search:

descending

id	html_url	issue_url	node_id	user	created_at	updated_at ▲	author_association	body	reactions	issue
319958229	https://github.com/pydata/xarray/pull/1421#issuecomment-319958229	https://api.github.com/repos/pydata/xarray/issues/1421	MDEyOklzc3VlQ29tbWVudDMxOTk1ODIyOQ==	lewisacidic 2941720	2017-08-03T12:44:16Z	2017-08-03T12:44:16Z	CONTRIBUTOR	Hi all, sorry for the lack of communication on this. I'm writing up my PhD thesis at the moment, and my deadline is increasingly looming. Once I've handed in I'll finish this off, and am more than happy to help with several other things I've encountered (I'll create the issues now). Once again, sorry for the delay!	{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	Adding arbitrary object serialization 230566456
310178801	https://github.com/pydata/xarray/pull/1421#issuecomment-310178801	https://api.github.com/repos/pydata/xarray/issues/1421	MDEyOklzc3VlQ29tbWVudDMxMDE3ODgwMQ==	lewisacidic 2941720	2017-06-21T19:19:30Z	2017-06-21T19:19:30Z	CONTRIBUTOR	Sorry for the holdup. I made progress 3 weekends ago, but didn't get a fully working example, and I've been swamped the whole of this month. This weekend I'll be free to finish this off.	{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	Adding arbitrary object serialization 230566456
303984724	https://github.com/pydata/xarray/pull/1421#issuecomment-303984724	https://api.github.com/repos/pydata/xarray/issues/1421	MDEyOklzc3VlQ29tbWVudDMwMzk4NDcyNA==	lewisacidic 2941720	2017-05-25T11:07:36Z	2017-05-25T11:07:36Z	CONTRIBUTOR	Sounds great, thanks for the feedback! I'll probably not get to this until the weekend, but I'll have another crack at it then.	{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	Adding arbitrary object serialization 230566456
303606683	https://github.com/pydata/xarray/pull/1421#issuecomment-303606683	https://api.github.com/repos/pydata/xarray/issues/1421	MDEyOklzc3VlQ29tbWVudDMwMzYwNjY4Mw==	shoyer 1217238	2017-05-24T03:18:16Z	2017-05-24T03:18:16Z	MEMBER	How about something like the following: In `encode_cf_variable`, create a new variable with pickle encoded data (if appropriate). This looks something like: ```python def encode_cf_variable(var, allow_pickle=False): ... if var.dtype == object: if allow_pickle: var = maybe_encode_pickle(var) else: raise TypeError return var def maybe_encode_pickle(var): if var.dtype == object: attrs = var.attrs.copy() safe_setitem('_FileFormat', 'python-pickle') protocol = var.encoding.pop('pickle_protocol', 2) data = utils.encode_pickle(var.values, protocol=protocol) var = Variable(var.dims, data, attrs, var.encoding) return var `` This reuses theencoding` parameter for setting the pickle protocol version, which is already what we use for similar variable specific encoding details. In the netCDF backends, add a check for variable with `dtype == object` with a `_FileFormat` attribute. If this is the case, call a `create_vlen_int8_dtype` method to create an appropriate dtype using backend specific methods (The behavior on the base class should raise an error), and proceed with writing the data in the usual way. For decoding, reverse the process. Convert custom vlen dtypes to `dtype=object` in the appropriate backend specific array wrapper type, but don't decode data. If `allow_pickle=True` and `var.attrs['_FileFormat'] == 'python-pickle'`, then `decode_cf_variable` should do the unpickling, moving `_FileFormat` from `attrs` to `encoding`. For bonus points, generalize handling of vlen types with `encoding`. Something like `encoding={'dtype': {'vlen': np.int8}}` could indicate that a special vlen with `np.int8` data should be created to encode this variable's data. (This is inspired by h5py's API for special types.)	{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	Adding arbitrary object serialization 230566456
303583082	https://github.com/pydata/xarray/pull/1421#issuecomment-303583082	https://api.github.com/repos/pydata/xarray/issues/1421	MDEyOklzc3VlQ29tbWVudDMwMzU4MzA4Mg==	shoyer 1217238	2017-05-24T00:52:20Z	2017-05-24T00:52:20Z	MEMBER	Pickle works out the protocol automatically (no protocol keyword for load or loads), so we wouldn't really need to save the protocol as an attribute, although it would be a way to work out which variables to unpickle once saved, if we went this route. I think we do want some sort of marker attribute, but I agree that it doesn't need to include the pickle version. Maybe the attribute `_FileFormat = 'python-pickle'` would make sense? This would have the advantage of being obvious to anyone inspecting the netCDF file with standard tools (not xarray). It seems a shame not to use np.void though, so perhaps it makes sense to add the opaque types to netCDF4-python and forget the np.uint8 trick. I think netCDF actually maps `np.int8` -> `NC_BYTE`, so that's at least some justification for this choice: http://www.unidata.ucar.edu/software/netcdf/docs/data_type.html Certainly handling opaque types in netCDF4-python would be nice, though I don't think it should be a blocker for this. I suspect the reason this isn't done is that NumPy maps `bytes` -> `np.string_` even on Python 3. Thus `np.void` is used far less often than it should. Also the `repr` for `np.void` has been pretty poor, though that's being worked on currently in https://github.com/numpy/numpy/pull/8981.	{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	Adding arbitrary object serialization 230566456
303580887	https://github.com/pydata/xarray/pull/1421#issuecomment-303580887	https://api.github.com/repos/pydata/xarray/issues/1421	MDEyOklzc3VlQ29tbWVudDMwMzU4MDg4Nw==	lewisacidic 2941720	2017-05-24T00:35:25Z	2017-05-24T00:38:37Z	CONTRIBUTOR	The pickle protocol is the second byte of any pickle btw, just had a look. For protocol 2+ anyway...	{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	Adding arbitrary object serialization 230566456
303580375	https://github.com/pydata/xarray/pull/1421#issuecomment-303580375	https://api.github.com/repos/pydata/xarray/issues/1421	MDEyOklzc3VlQ29tbWVudDMwMzU4MDM3NQ==	lewisacidic 2941720	2017-05-24T00:31:31Z	2017-05-24T00:31:31Z	CONTRIBUTOR	Thanks for giving this a shot! No problem, I really want this feature so I can use xarray for a cheminformatics library I'm working on! Hopefully we can work out the best way to do this whilst keeping everything as nice and organised as it is was before I touched the code... I'm having a hard time imagining any other serialization formats for serializing arbitrary Python objects. pickle is pretty standard, though we might switch the argument for to_netcdf to pickle_protocol to allow indicating the pickle version (which would default to None, for don't pickle). I couldn't think of others either - it made sense as a keyword for `cf_encoder` where no pickling currently happens which is why I changed it originally. `allow_pickle` definitely makes sense, I've used `np.save` loads but didn't know that keyword existed. Perhaps two kws, `allow_pickle` and `pickle_protocol`, so its more explicit, and so we can set a default protocol? Yes, this is a little tricky. The current design is not great here. Ideally, though, we would still keep all of the encoding/decoding logic separate from the datastores. I need to think about this a little more. Yeah, it definitely isn't great, I wanted a working example and that's what I managed to do before sleeping! I'll keep looking through the code to familiarize myself a bit more with it - I would be interested to see what you suggest! One other concern is how to represent this data on disk in netCDF/HDF5 variables. Ideally, we would have a format that could work -- at least in principle -- with h5netcdf/h5py as well as netCDF4-python. Yeah, that would definitely be good. Annoyingly, these libraries currently have incompatible dtype support: netCDF4-python supports variable length types with custom name. It does not support HDF5's opaque type, though the netCDF-C libraries do support opaque types, so adding them to netCDF4-Python would be relatively straightforward. h5py supports variable length types, but not their name field. It maps np.void to HDF5's opaque type, which would be a pretty sensible storage type if netCDF4-Python supported it. So if we want something that works with both, we'll need to add some additional metadata field in the form of an attribute to indicate how do decoding. Maybe something like _PickleProtocol, which would store the version of the pickle protocol used to write the data? The np.void type was pretty much designed for this sort of thing from what I can see, I was pretty surprised that netCDF4-python didn't have something similar, hence the strange `np.uint8` stuff. We could do the same for h5py, using `h5py.special_dtype`, although as you say there is no name so it can't work the same for h5py as it does now. Pickle works out the protocol automatically (no `protocol` keyword for `load` or `loads`), so we wouldn't really need to save the protocol as an attribute, although it would be a way to work out which variables to unpickle once saved, if we went this route. It seems a shame not to use np.void though, so perhaps it makes sense to add the opaque types to netCDF4-python and forget the `np.uint8` trick.	{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	Adding arbitrary object serialization 230566456
303541907	https://github.com/pydata/xarray/pull/1421#issuecomment-303541907	https://api.github.com/repos/pydata/xarray/issues/1421	MDEyOklzc3VlQ29tbWVudDMwMzU0MTkwNw==	shoyer 1217238	2017-05-23T21:46:39Z	2017-05-23T21:48:18Z	MEMBER	Thanks for giving this a shot! I added allow_object kwarg (rather than allow_pickle, no reason to firmly attach pickle to the api, could use something else for other backends). I'm having a hard time imagining any other serialization formats for serializing arbitrary Python objects. `pickle` is pretty standard, though we might switch the argument for `to_netcdf` to `pickle_protocol` to allow indicating the pickle version (which would default to `None`, for don't pickle). One addition reason for favoring `allow_pickle` is that it's the argument used by `np.save` and `np.load`. NetCDF4DataStore handles this independently from the cf_encoder/decoder. The dtype support made it hard to decouple, plus I think object serialization is a backend dependent issue. Yes, this is a little tricky. The current design is not great here. Ideally, though, we would still keep all of the encoding/decoding logic separate from the datastores. I need to think about this a little more. One other concern is how to represent this data on disk in netCDF/HDF5 variables. Ideally, we would have a format that could work -- at least in principle -- with `h5netcdf`/`h5py` as well as `netCDF4-python`. Annoyingly, these libraries currently have incompatible dtype support: netCDF4-python supports variable length types with custom `name`. It does not support HDF5's opaque type, though the netCDF-C libraries do support opaque types, so adding them to netCDF4-Python would be relatively straightforward. h5py supports variable length types, but not their `name` field. It maps `np.void` to HDF5's opaque type, which would be a pretty sensible storage type if netCDF4-Python supported it. So if we want something that works with both, we'll need to add some additional metadata field in the form of an attribute to indicate how do decoding. Maybe something like `_PickleProtocol`, which would store the version of the pickle protocol used to write the data? I have some inline comments I'll add below.	{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	Adding arbitrary object serialization 230566456

Advanced export

JSON shape: default, array, newline-delimited, object

CREATE TABLE [issue_comments] (
   [html_url] TEXT,
   [issue_url] TEXT,
   [id] INTEGER PRIMARY KEY,
   [node_id] TEXT,
   [user] INTEGER REFERENCES [users]([id]),
   [created_at] TEXT,
   [updated_at] TEXT,
   [author_association] TEXT,
   [body] TEXT,
   [reactions] TEXT,
   [performed_via_github_app] TEXT,
   [issue] INTEGER REFERENCES [issues]([id])
);
CREATE INDEX [idx_issue_comments_issue]
    ON [issue_comments] ([issue]);
CREATE INDEX [idx_issue_comments_user]
    ON [issue_comments] ([user]);