html_url,issue_url,id,node_id,user,created_at,updated_at,author_association,body,reactions,performed_via_github_app,issue https://github.com/pydata/xarray/issues/1536#issuecomment-381679096,https://api.github.com/repos/pydata/xarray/issues/1536,381679096,MDEyOklzc3VlQ29tbWVudDM4MTY3OTA5Ng==,1217238,2018-04-16T17:09:06Z,2018-04-16T17:09:06Z,MEMBER,"@crusaderky That would work for me, too. No strong preference from my side. In the worst case, we would be stuck maintaining the extra encoding `compression='zlib'` indefinitely, but that's not a big deal. Take [a look at h5netcdf](https://github.com/shoyer/h5netcdf/blob/master/h5netcdf/legacyapi.py) for a reference on what that translation layer should do.","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,253476466 https://github.com/pydata/xarray/issues/1536#issuecomment-381504579,https://api.github.com/repos/pydata/xarray/issues/1536,381504579,MDEyOklzc3VlQ29tbWVudDM4MTUwNDU3OQ==,6213168,2018-04-16T07:26:13Z,2018-04-16T07:26:13Z,MEMBER,"@shoyer almost finished. However when implementing it I realised that, instead of writing a new engine h5netcdf-new, I could more simply reimplement the already existing h5netcdf to use the new API, and then accept (through a trivial translation layer) both the NetCDF4-python encoding (gzip=True) and the h5py one (compression=zlib). Let me know your thoughts. ","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,253476466 https://github.com/pydata/xarray/issues/1536#issuecomment-377197301,https://api.github.com/repos/pydata/xarray/issues/1536,377197301,MDEyOklzc3VlQ29tbWVudDM3NzE5NzMwMQ==,6213168,2018-03-29T10:47:08Z,2018-03-29T10:47:08Z,MEMBER,@shoyer new non-functioning public API prototype - please confirm this is what you had in mind,"{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,253476466 https://github.com/pydata/xarray/issues/1536#issuecomment-373769617,https://api.github.com/repos/pydata/xarray/issues/1536,373769617,MDEyOklzc3VlQ29tbWVudDM3Mzc2OTYxNw==,1217238,2018-03-16T16:31:07Z,2018-03-16T16:31:07Z,MEMBER,"If using custom compression filters now results in valid netCDF4 files, then I'd rather we still called this `to_netcdf()` rather that defining our own custom HDF5 variant -- even if you can only read the files with netCDF-C or h5netcdf. We should just be careful the document the portability concerns (which are going to be concern with custom filters, regardless).","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,253476466 https://github.com/pydata/xarray/issues/1536#issuecomment-373566794,https://api.github.com/repos/pydata/xarray/issues/1536,373566794,MDEyOklzc3VlQ29tbWVudDM3MzU2Njc5NA==,6213168,2018-03-16T00:38:50Z,2018-03-16T00:38:50Z,MEMBER,@shoyer ping - could you give feedback on the API prototype?,"{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,253476466 https://github.com/pydata/xarray/issues/1536#issuecomment-366096878,https://api.github.com/repos/pydata/xarray/issues/1536,366096878,MDEyOklzc3VlQ29tbWVudDM2NjA5Njg3OA==,6213168,2018-02-15T23:28:08Z,2018-02-15T23:28:08Z,MEMBER,"@shoyer , see if you like the public API prototype linked above","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,253476466 https://github.com/pydata/xarray/issues/1536#issuecomment-365841787,https://api.github.com/repos/pydata/xarray/issues/1536,365841787,MDEyOklzc3VlQ29tbWVudDM2NTg0MTc4Nw==,1217238,2018-02-15T07:03:04Z,2018-02-15T07:03:04Z,MEMBER,"@crusaderky In case adding this to the netCDF4 library doesn't work out: > I'm not sure I understood your latest comment - are you implying that to_hdf5 should internally use the h5netcdf module? I understand the rationale but it sounds a bit counter-intuitive to me? Yes, I would suggest that `to_hdf5()` using h5netcdf, but with `invalid_netcdf=True`. > Also, to allow for non-zlib compression we need to either tap into the new h5netcdf API, or into h5py directly - so I'm afraid to_hdf5 can't be a simple wrapper around to_netcdf. Yes, this is unfortunately true. > new method Dataset.to_hdf5 - starts as a copy-paste of to_netcdf, including the backend functions underneath Yes > new unit tests, starting as a copy-paste of all unit tests for to_netcdf Yes > change open_dataset and open_mfdataset: > add new possible value for the engine field, ""hdf5"" > if engine is None and file name terminates with .nc, use the current algorithm to choose default engine > if engine is None and file name terminates with .h5, use h5py > if engine is not None, ignore file extension I think this is a little easier than that. h5netcdf will always be able to *read* invalid netCDF files, so we can just continue to use `engine='h5netcdf'`. As for picking the default engine, see https://github.com/pydata/xarray/pull/1682, which is pretty close, though I need to think a little bit harder about the API to make sure it's right. > add to high level documentation and tutorials Yes","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,253476466 https://github.com/pydata/xarray/issues/1536#issuecomment-365479970,https://api.github.com/repos/pydata/xarray/issues/1536,365479970,MDEyOklzc3VlQ29tbWVudDM2NTQ3OTk3MA==,905179,2018-02-14T02:55:22Z,2018-02-14T02:55:22Z,NONE,"The methods that need to be implemented are (in the C API) as follows: > int nc_def_var_filter(int ncid, int varid, unsigned int id, size_t nparams, const unsigned int* parms); The only tricky part is passing a vector of unsigned integers (the parms argument). >int nc_inq_var_filter(int ncid, int varid, unsigned int* idp, size_t* nparams, unsigned int* params); This requires passing values out via pointers. Also, this uses the standard netcdf-c trick in which the function is called twice. First with nparams defined, but params has the value NULL. This gives the caller the number of parameters. The function is called a second time after allocating the params vector and with that vector as the final argument. ","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,253476466 https://github.com/pydata/xarray/issues/1536#issuecomment-365476120,https://api.github.com/repos/pydata/xarray/issues/1536,365476120,MDEyOklzc3VlQ29tbWVudDM2NTQ3NjEyMA==,905179,2018-02-14T02:30:05Z,2018-02-14T02:30:05Z,NONE,"The API is not yet exposed thru anything but the C api. So the python, fortran, and c++ wrappers do not yet show it. Passing it thru netcdf-python is probably pretty trivian, though.","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,253476466 https://github.com/pydata/xarray/issues/1536#issuecomment-365475898,https://api.github.com/repos/pydata/xarray/issues/1536,365475898,MDEyOklzc3VlQ29tbWVudDM2NTQ3NTg5OA==,905179,2018-02-14T02:28:42Z,2018-02-14T02:28:42Z,NONE,"A bit confusing, but I think the answer is yes. For example we provide a bzip2 compression plugin as an example (see examples/C/hdf5plugins in the netcdf-c distribution).","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,253476466 https://github.com/pydata/xarray/issues/1536#issuecomment-365457702,https://api.github.com/repos/pydata/xarray/issues/1536,365457702,MDEyOklzc3VlQ29tbWVudDM2NTQ1NzcwMg==,6213168,2018-02-14T00:50:21Z,2018-02-14T00:50:21Z,MEMBER,"@DennisHeimbigner also, does this mean that h5netcdf should be changed to remove non-gzip compression algorithms from the list of features that requires invalid_netcdf=True?","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,253476466 https://github.com/pydata/xarray/issues/1536#issuecomment-365450097,https://api.github.com/repos/pydata/xarray/issues/1536,365450097,MDEyOklzc3VlQ29tbWVudDM2NTQ1MDA5Nw==,6213168,2018-02-14T00:11:42Z,2018-02-14T00:11:42Z,MEMBER,@DennisHeimbigner looks like it's not exposed through netcdf4-python though?,"{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,253476466 https://github.com/pydata/xarray/issues/1536#issuecomment-365419155,https://api.github.com/repos/pydata/xarray/issues/1536,365419155,MDEyOklzc3VlQ29tbWVudDM2NTQxOTE1NQ==,905179,2018-02-13T21:59:35Z,2018-02-13T21:59:35Z,NONE,"You may already know, but should note that the filter stuff in netcdf-c is now available in netcdf-c library version 4.6.0. So any filter plugin usable with hdf5 can now be used both for reading and writing thru the netcdf-c api.","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,253476466 https://github.com/pydata/xarray/issues/1536#issuecomment-365410944,https://api.github.com/repos/pydata/xarray/issues/1536,365410944,MDEyOklzc3VlQ29tbWVudDM2NTQxMDk0NA==,6213168,2018-02-13T21:31:15Z,2018-02-13T21:32:43Z,MEMBER,"@shoyer I'm starting to work on this. I'm not sure I understood your latest comment - are you implying that ``to_hdf5`` should internally use the h5netcdf module? I understand the rationale but it sounds a bit counter-intuitive to me? Also, to allow for non-zlib compression we need to either tap into the new h5netcdf API, or into h5py directly - so I'm afraid ``to_hdf5`` can't be a simple wrapper around ``to_netcdf``. Could you help me compile a shopping list? - new method ``Dataset.to_hdf5`` - starts as a copy-paste of ``to_netcdf``, including the backend functions underneath - new unit tests, starting as a copy-paste of all unit tests for ``to_netcdf`` - change ``open_dataset`` and ``open_mfdataset``: - add new possible value for the engine field, ""hdf5"" - if engine is None and file name terminates with ``.nc``, use the current algorithm to choose default engine - if engine is None and file name terminates with ``.h5``, use h5py - if engine is not None, ignore file extension - add to high level documentation and tutorials - Other?","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,253476466 https://github.com/pydata/xarray/issues/1536#issuecomment-326037069,https://api.github.com/repos/pydata/xarray/issues/1536,326037069,MDEyOklzc3VlQ29tbWVudDMyNjAzNzA2OQ==,1217238,2017-08-30T15:58:35Z,2017-08-30T15:58:35Z,MEMBER,I just released new version of h5netcdf (0.4.0). It adds a `invalid_netdf` argument to the file constructors. So the right way to build this new backend (if we still want to go this way) would be to require `h5netdf>=v0.4` and set `invalid_netcdf=True` when called from `to_hdf5()`.,"{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,253476466 https://github.com/pydata/xarray/issues/1536#issuecomment-325775498,https://api.github.com/repos/pydata/xarray/issues/1536,325775498,MDEyOklzc3VlQ29tbWVudDMyNTc3NTQ5OA==,905179,2017-08-29T19:35:55Z,2017-08-29T19:35:55Z,NONE,"The github branch filters.dmh for the netcdf-c library now exposes the HDF5 dynamic filter capability. This is documented here: https://github.com/Unidata/netcdf-c/blob/filters.dmh/docs/filters.md I welcome suggestions for improvements. I also note that I am extending this branch to now handle szip compression. It turns out there is now a patent-free implementation called libaec (HT Rich Signell) so there is no reason not to make it available.","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,253476466 https://github.com/pydata/xarray/issues/1536#issuecomment-325720084,https://api.github.com/repos/pydata/xarray/issues/1536,325720084,MDEyOklzc3VlQ29tbWVudDMyNTcyMDA4NA==,4324946,2017-08-29T16:31:17Z,2017-08-29T16:31:17Z,NONE,"For what it is worth, we have a branch at netCDF-C that allows for different compression plugins using the hdf5 plugin architecture. It will not be in the 4.5.0 release, but once 4.5.0 is finished we will be looking at it for either the subsequent release. There is still testing and documentation to be completed, but @DennisHeimbigner (who implemented it) can speak more about it.","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,253476466 https://github.com/pydata/xarray/issues/1536#issuecomment-325713273,https://api.github.com/repos/pydata/xarray/issues/1536,325713273,MDEyOklzc3VlQ29tbWVudDMyNTcxMzI3Mw==,221526,2017-08-29T16:07:37Z,2017-08-29T16:07:37Z,CONTRIBUTOR,Cool. Thx @shoyer .,"{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,253476466 https://github.com/pydata/xarray/issues/1536#issuecomment-325712523,https://api.github.com/repos/pydata/xarray/issues/1536,325712523,MDEyOklzc3VlQ29tbWVudDMyNTcxMjUyMw==,1217238,2017-08-29T16:05:14Z,2017-08-29T16:05:14Z,MEMBER,I'm adding a loud warning about this (will eventually be an error) to h5netcdf.,"{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,253476466 https://github.com/pydata/xarray/issues/1536#issuecomment-325555913,https://api.github.com/repos/pydata/xarray/issues/1536,325555913,MDEyOklzc3VlQ29tbWVudDMyNTU1NTkxMw==,1217238,2017-08-29T05:02:49Z,2017-08-29T05:02:49Z,MEMBER,"> Please, please, please don't write out ""netCDF"" files that don't conform to the spec. Of course not. I understand the issue here. I'll issue a fix for h5netcdf to disable this unless explicitly opted into, but we'll also need a fix for xarray to support the users who are currently using it to save data with complex values -- probably by adding a `to_hdf5()` method. Here is the NetCDF-C issue I opened on reading these sorts of HDF5 enums: https://github.com/Unidata/netcdf-c/issues/267. > You're not actually saying the netCDF-c library should check for this custom format, are you? No.","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,253476466 https://github.com/pydata/xarray/issues/1536#issuecomment-325535079,https://api.github.com/repos/pydata/xarray/issues/1536,325535079,MDEyOklzc3VlQ29tbWVudDMyNTUzNTA3OQ==,221526,2017-08-29T02:17:02Z,2017-08-29T02:17:02Z,CONTRIBUTOR,"(cc @WardF) Please, please, please don't write out ""netCDF"" files that don't conform to the [spec](http://www.unidata.ucar.edu/software/netcdf/docs/file_format_specifications.html#netcdf_4_spec). Either work with us to try to add the needed features to the spec (and the C-library) or call them something else. The spec exists for a reason. When such non-conformant files are distributed (and they will be), this creates a needless support load for the netcdf-c and netcdf-java developers (not to mention netCDF4-python). >Yes, I suppose so (and this should be fixed). h5netcdf currently writes the _NCProperties attribute to all files, though it uses a custom format that could be detected. You're not actually saying the netCDF-c library should check for this custom format, are you?","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,253476466 https://github.com/pydata/xarray/issues/1536#issuecomment-325516877,https://api.github.com/repos/pydata/xarray/issues/1536,325516877,MDEyOklzc3VlQ29tbWVudDMyNTUxNjg3Nw==,1217238,2017-08-29T00:08:38Z,2017-08-29T00:08:38Z,MEMBER,"> But these are still considered netCDF files, not HDF5 files? As in, they declare attributes that say ""this is a netCDF file""? Yes, I suppose so (and this should be fixed). h5netcdf currently writes the `_NCProperties` attribute to all files, though it uses a [custom format](https://github.com/shoyer/h5netcdf/blob/master/h5netcdf/core.py#L25) that could be detected. I hadn't really thought about this because the convention for marking HDF5 files as netCDF files is very recent and not actually enforced by any software (to my knowledge). ","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,253476466 https://github.com/pydata/xarray/issues/1536#issuecomment-325515228,https://api.github.com/repos/pydata/xarray/issues/1536,325515228,MDEyOklzc3VlQ29tbWVudDMyNTUxNTIyOA==,221526,2017-08-28T23:57:02Z,2017-08-28T23:57:21Z,CONTRIBUTOR,"But these are still considered *netCDF* files, not HDF5 files? As in, they declare attributes that say ""this is a netCDF file""?","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,253476466 https://github.com/pydata/xarray/issues/1536#issuecomment-325514854,https://api.github.com/repos/pydata/xarray/issues/1536,325514854,MDEyOklzc3VlQ29tbWVudDMyNTUxNDg1NA==,1217238,2017-08-28T23:54:31Z,2017-08-28T23:54:31Z,MEMBER,"@dopplershift No, I don't think so. NetCDF-C only supports zlib compression (and doesn't support h5py's handling of complex variables, either, which use an HDF5 enumerated type).","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,253476466 https://github.com/pydata/xarray/issues/1536#issuecomment-325513941,https://api.github.com/repos/pydata/xarray/issues/1536,325513941,MDEyOklzc3VlQ29tbWVudDMyNTUxMzk0MQ==,221526,2017-08-28T23:48:10Z,2017-08-28T23:48:10Z,CONTRIBUTOR,@shoyer I just want to clarify: is the netCDF C library able to read files using these features?,"{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,253476466 https://github.com/pydata/xarray/issues/1536#issuecomment-325512111,https://api.github.com/repos/pydata/xarray/issues/1536,325512111,MDEyOklzc3VlQ29tbWVudDMyNTUxMjExMQ==,1217238,2017-08-28T23:35:42Z,2017-08-28T23:35:42Z,MEMBER,"h5netcdf already produces (slightly) incompatible netCDF files for some edge cases (e.g., complex numbers). This should probably be fixed, either by disabling these features or requiring an explicit opt-in, but nobody has gotten around to writing a fix yet (see https://github.com/shoyer/h5netcdf/issues/28). In practice, many of our users seem to be pretty happy making use of these new features. LZF compression would just be another one. I like @jhamman's idea of adding a dedicated `to_hdf5()` method that handles encoding with h5netcdf's new API. This would basically be a clone of `to_netcdf()`. In practice, I guess we would implement this with another engine for h5netcdf. @petacube zstandard is great, but it's not in h5py yet! I think we'll need `zarr` for that (see https://github.com/pydata/xarray/pull/1528)","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,253476466 https://github.com/pydata/xarray/issues/1536#issuecomment-325511629,https://api.github.com/repos/pydata/xarray/issues/1536,325511629,MDEyOklzc3VlQ29tbWVudDMyNTUxMTYyOQ==,30301994,2017-08-28T23:32:40Z,2017-08-28T23:32:40Z,NONE,"how about zstandard? > On Aug 28, 2017, at 7:22 PM, Joe Hamman wrote: > > This is an interesting idea. I think something similar was discussed in #66 . > > The main problem I see is that current netCDF libraries don't support LZF so I think we're really talking about adding an HDF5 backend. In practice, this could use the new API of h5netcdf or h5py directly. Either way, my initial thought is that we want a to_hdf5() method on the dataset. > > — > You are receiving this because you are subscribed to this thread. > Reply to this email directly, view it on GitHub , or mute the thread . > ","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,253476466 https://github.com/pydata/xarray/issues/1536#issuecomment-325510075,https://api.github.com/repos/pydata/xarray/issues/1536,325510075,MDEyOklzc3VlQ29tbWVudDMyNTUxMDA3NQ==,2443309,2017-08-28T23:22:13Z,2017-08-28T23:22:13Z,MEMBER,"This is an interesting idea. I think something similar was discussed in #66. The main problem I see is that current netCDF libraries don't support `LZF` so I think we're really talking about adding an HDF5 backend. In practice, this could use the new API of `h5netcdf` or `h5py` directly. Either way, my initial thought is that we want a `to_hdf5()` method on the dataset.","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,253476466