html_url,issue_url,id,node_id,user,created_at,updated_at,author_association,body,reactions,performed_via_github_app,issue
https://github.com/pydata/xarray/issues/1536#issuecomment-381679096,https://api.github.com/repos/pydata/xarray/issues/1536,381679096,MDEyOklzc3VlQ29tbWVudDM4MTY3OTA5Ng==,1217238,2018-04-16T17:09:06Z,2018-04-16T17:09:06Z,MEMBER,"@crusaderky That would work for me, too. No strong preference from my side. In the worst case, we would be stuck maintaining the extra encoding `compression='zlib'` indefinitely, but that's not a big deal.
Take [a look at h5netcdf](https://github.com/shoyer/h5netcdf/blob/master/h5netcdf/legacyapi.py) for a reference on what that translation layer should do.","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,253476466
https://github.com/pydata/xarray/issues/1536#issuecomment-381504579,https://api.github.com/repos/pydata/xarray/issues/1536,381504579,MDEyOklzc3VlQ29tbWVudDM4MTUwNDU3OQ==,6213168,2018-04-16T07:26:13Z,2018-04-16T07:26:13Z,MEMBER,"@shoyer almost finished. However when implementing it I realised that, instead of writing a new engine h5netcdf-new, I could more simply reimplement the already existing h5netcdf to use the new API, and then accept (through a trivial translation layer) both the NetCDF4-python encoding (gzip=True) and the h5py one (compression=zlib). Let me know your thoughts. ","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,253476466
https://github.com/pydata/xarray/issues/1536#issuecomment-377197301,https://api.github.com/repos/pydata/xarray/issues/1536,377197301,MDEyOklzc3VlQ29tbWVudDM3NzE5NzMwMQ==,6213168,2018-03-29T10:47:08Z,2018-03-29T10:47:08Z,MEMBER,@shoyer new non-functioning public API prototype - please confirm this is what you had in mind,"{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,253476466
https://github.com/pydata/xarray/issues/1536#issuecomment-373769617,https://api.github.com/repos/pydata/xarray/issues/1536,373769617,MDEyOklzc3VlQ29tbWVudDM3Mzc2OTYxNw==,1217238,2018-03-16T16:31:07Z,2018-03-16T16:31:07Z,MEMBER,"If using custom compression filters now results in valid netCDF4 files, then I'd rather we still called this `to_netcdf()` rather that defining our own custom HDF5 variant -- even if you can only read the files with netCDF-C or h5netcdf. We should just be careful the document the portability concerns (which are going to be concern with custom filters, regardless).","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,253476466
https://github.com/pydata/xarray/issues/1536#issuecomment-373566794,https://api.github.com/repos/pydata/xarray/issues/1536,373566794,MDEyOklzc3VlQ29tbWVudDM3MzU2Njc5NA==,6213168,2018-03-16T00:38:50Z,2018-03-16T00:38:50Z,MEMBER,@shoyer ping - could you give feedback on the API prototype?,"{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,253476466
https://github.com/pydata/xarray/issues/1536#issuecomment-366096878,https://api.github.com/repos/pydata/xarray/issues/1536,366096878,MDEyOklzc3VlQ29tbWVudDM2NjA5Njg3OA==,6213168,2018-02-15T23:28:08Z,2018-02-15T23:28:08Z,MEMBER,"@shoyer , see if you like the public API prototype linked above","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,253476466
https://github.com/pydata/xarray/issues/1536#issuecomment-365841787,https://api.github.com/repos/pydata/xarray/issues/1536,365841787,MDEyOklzc3VlQ29tbWVudDM2NTg0MTc4Nw==,1217238,2018-02-15T07:03:04Z,2018-02-15T07:03:04Z,MEMBER,"@crusaderky In case adding this to the netCDF4 library doesn't work out:
> I'm not sure I understood your latest comment - are you implying that to_hdf5 should internally use the h5netcdf module? I understand the rationale but it sounds a bit counter-intuitive to me?
Yes, I would suggest that `to_hdf5()` using h5netcdf, but with `invalid_netcdf=True`.
> Also, to allow for non-zlib compression we need to either tap into the new h5netcdf API, or into h5py directly - so I'm afraid to_hdf5 can't be a simple wrapper around to_netcdf.
Yes, this is unfortunately true.
> new method Dataset.to_hdf5 - starts as a copy-paste of to_netcdf, including the backend functions underneath
Yes
> new unit tests, starting as a copy-paste of all unit tests for to_netcdf
Yes
> change open_dataset and open_mfdataset:
> add new possible value for the engine field, ""hdf5""
> if engine is None and file name terminates with .nc, use the current algorithm to choose default engine
> if engine is None and file name terminates with .h5, use h5py
> if engine is not None, ignore file extension
I think this is a little easier than that. h5netcdf will always be able to *read* invalid netCDF files, so we can just continue to use `engine='h5netcdf'`.
As for picking the default engine, see https://github.com/pydata/xarray/pull/1682, which is pretty close, though I need to think a little bit harder about the API to make sure it's right.
> add to high level documentation and tutorials
Yes","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,253476466
https://github.com/pydata/xarray/issues/1536#issuecomment-365457702,https://api.github.com/repos/pydata/xarray/issues/1536,365457702,MDEyOklzc3VlQ29tbWVudDM2NTQ1NzcwMg==,6213168,2018-02-14T00:50:21Z,2018-02-14T00:50:21Z,MEMBER,"@DennisHeimbigner also, does this mean that h5netcdf should be changed to remove non-gzip compression algorithms from the list of features that requires invalid_netcdf=True?","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,253476466
https://github.com/pydata/xarray/issues/1536#issuecomment-365450097,https://api.github.com/repos/pydata/xarray/issues/1536,365450097,MDEyOklzc3VlQ29tbWVudDM2NTQ1MDA5Nw==,6213168,2018-02-14T00:11:42Z,2018-02-14T00:11:42Z,MEMBER,@DennisHeimbigner looks like it's not exposed through netcdf4-python though?,"{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,253476466
https://github.com/pydata/xarray/issues/1536#issuecomment-365410944,https://api.github.com/repos/pydata/xarray/issues/1536,365410944,MDEyOklzc3VlQ29tbWVudDM2NTQxMDk0NA==,6213168,2018-02-13T21:31:15Z,2018-02-13T21:32:43Z,MEMBER,"@shoyer I'm starting to work on this.
I'm not sure I understood your latest comment - are you implying that ``to_hdf5`` should internally use the h5netcdf module? I understand the rationale but it sounds a bit counter-intuitive to me?
Also, to allow for non-zlib compression we need to either tap into the new h5netcdf API, or into h5py directly - so I'm afraid ``to_hdf5`` can't be a simple wrapper around ``to_netcdf``.
Could you help me compile a shopping list?
- new method ``Dataset.to_hdf5`` - starts as a copy-paste of ``to_netcdf``, including the backend functions underneath
- new unit tests, starting as a copy-paste of all unit tests for ``to_netcdf``
- change ``open_dataset`` and ``open_mfdataset``:
- add new possible value for the engine field, ""hdf5""
- if engine is None and file name terminates with ``.nc``, use the current algorithm to choose default engine
- if engine is None and file name terminates with ``.h5``, use h5py
- if engine is not None, ignore file extension
- add to high level documentation and tutorials
- Other?","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,253476466
https://github.com/pydata/xarray/issues/1536#issuecomment-326037069,https://api.github.com/repos/pydata/xarray/issues/1536,326037069,MDEyOklzc3VlQ29tbWVudDMyNjAzNzA2OQ==,1217238,2017-08-30T15:58:35Z,2017-08-30T15:58:35Z,MEMBER,I just released new version of h5netcdf (0.4.0). It adds a `invalid_netdf` argument to the file constructors. So the right way to build this new backend (if we still want to go this way) would be to require `h5netdf>=v0.4` and set `invalid_netcdf=True` when called from `to_hdf5()`.,"{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,253476466
https://github.com/pydata/xarray/issues/1536#issuecomment-325712523,https://api.github.com/repos/pydata/xarray/issues/1536,325712523,MDEyOklzc3VlQ29tbWVudDMyNTcxMjUyMw==,1217238,2017-08-29T16:05:14Z,2017-08-29T16:05:14Z,MEMBER,I'm adding a loud warning about this (will eventually be an error) to h5netcdf.,"{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,253476466
https://github.com/pydata/xarray/issues/1536#issuecomment-325555913,https://api.github.com/repos/pydata/xarray/issues/1536,325555913,MDEyOklzc3VlQ29tbWVudDMyNTU1NTkxMw==,1217238,2017-08-29T05:02:49Z,2017-08-29T05:02:49Z,MEMBER,"> Please, please, please don't write out ""netCDF"" files that don't conform to the spec.
Of course not. I understand the issue here.
I'll issue a fix for h5netcdf to disable this unless explicitly opted into, but we'll also need a fix for xarray to support the users who are currently using it to save data with complex values -- probably by adding a `to_hdf5()` method.
Here is the NetCDF-C issue I opened on reading these sorts of HDF5 enums: https://github.com/Unidata/netcdf-c/issues/267.
> You're not actually saying the netCDF-c library should check for this custom format, are you?
No.","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,253476466
https://github.com/pydata/xarray/issues/1536#issuecomment-325516877,https://api.github.com/repos/pydata/xarray/issues/1536,325516877,MDEyOklzc3VlQ29tbWVudDMyNTUxNjg3Nw==,1217238,2017-08-29T00:08:38Z,2017-08-29T00:08:38Z,MEMBER,"> But these are still considered netCDF files, not HDF5 files? As in, they declare attributes that say ""this is a netCDF file""?
Yes, I suppose so (and this should be fixed). h5netcdf currently writes the `_NCProperties` attribute to all files, though it uses a [custom format](https://github.com/shoyer/h5netcdf/blob/master/h5netcdf/core.py#L25) that could be detected.
I hadn't really thought about this because the convention for marking HDF5 files as netCDF files is very recent and not actually enforced by any software (to my knowledge).
","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,253476466
https://github.com/pydata/xarray/issues/1536#issuecomment-325514854,https://api.github.com/repos/pydata/xarray/issues/1536,325514854,MDEyOklzc3VlQ29tbWVudDMyNTUxNDg1NA==,1217238,2017-08-28T23:54:31Z,2017-08-28T23:54:31Z,MEMBER,"@dopplershift No, I don't think so. NetCDF-C only supports zlib compression (and doesn't support h5py's handling of complex variables, either, which use an HDF5 enumerated type).","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,253476466
https://github.com/pydata/xarray/issues/1536#issuecomment-325512111,https://api.github.com/repos/pydata/xarray/issues/1536,325512111,MDEyOklzc3VlQ29tbWVudDMyNTUxMjExMQ==,1217238,2017-08-28T23:35:42Z,2017-08-28T23:35:42Z,MEMBER,"h5netcdf already produces (slightly) incompatible netCDF files for some edge cases (e.g., complex numbers). This should probably be fixed, either by disabling these features or requiring an explicit opt-in, but nobody has gotten around to writing a fix yet (see https://github.com/shoyer/h5netcdf/issues/28).
In practice, many of our users seem to be pretty happy making use of these new features. LZF compression would just be another one.
I like @jhamman's idea of adding a dedicated `to_hdf5()` method that handles encoding with h5netcdf's new API. This would basically be a clone of `to_netcdf()`. In practice, I guess we would implement this with another engine for h5netcdf.
@petacube zstandard is great, but it's not in h5py yet! I think we'll need `zarr` for that (see https://github.com/pydata/xarray/pull/1528)","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,253476466
https://github.com/pydata/xarray/issues/1536#issuecomment-325510075,https://api.github.com/repos/pydata/xarray/issues/1536,325510075,MDEyOklzc3VlQ29tbWVudDMyNTUxMDA3NQ==,2443309,2017-08-28T23:22:13Z,2017-08-28T23:22:13Z,MEMBER,"This is an interesting idea. I think something similar was discussed in #66.
The main problem I see is that current netCDF libraries don't support `LZF` so I think we're really talking about adding an HDF5 backend. In practice, this could use the new API of `h5netcdf` or `h5py` directly. Either way, my initial thought is that we want a `to_hdf5()` method on the dataset.","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,253476466