home / github

Menu
  • Search all tables
  • GraphQL API

issue_comments

Table actions
  • GraphQL API for issue_comments

9 rows where author_association = "MEMBER", issue = 253476466 and user = 1217238 sorted by updated_at descending

✎ View and edit SQL

This data as json, CSV (advanced)

Suggested facets: created_at (date)

user 1

  • shoyer · 9 ✖

issue 1

  • Better compression algorithms for NetCDF · 9 ✖

author_association 1

  • MEMBER · 9 ✖
id html_url issue_url node_id user created_at updated_at ▲ author_association body reactions performed_via_github_app issue
381679096 https://github.com/pydata/xarray/issues/1536#issuecomment-381679096 https://api.github.com/repos/pydata/xarray/issues/1536 MDEyOklzc3VlQ29tbWVudDM4MTY3OTA5Ng== shoyer 1217238 2018-04-16T17:09:06Z 2018-04-16T17:09:06Z MEMBER

@crusaderky That would work for me, too. No strong preference from my side. In the worst case, we would be stuck maintaining the extra encoding compression='zlib' indefinitely, but that's not a big deal.

Take a look at h5netcdf for a reference on what that translation layer should do.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Better compression algorithms for NetCDF 253476466
373769617 https://github.com/pydata/xarray/issues/1536#issuecomment-373769617 https://api.github.com/repos/pydata/xarray/issues/1536 MDEyOklzc3VlQ29tbWVudDM3Mzc2OTYxNw== shoyer 1217238 2018-03-16T16:31:07Z 2018-03-16T16:31:07Z MEMBER

If using custom compression filters now results in valid netCDF4 files, then I'd rather we still called this to_netcdf() rather that defining our own custom HDF5 variant -- even if you can only read the files with netCDF-C or h5netcdf. We should just be careful the document the portability concerns (which are going to be concern with custom filters, regardless).

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Better compression algorithms for NetCDF 253476466
365841787 https://github.com/pydata/xarray/issues/1536#issuecomment-365841787 https://api.github.com/repos/pydata/xarray/issues/1536 MDEyOklzc3VlQ29tbWVudDM2NTg0MTc4Nw== shoyer 1217238 2018-02-15T07:03:04Z 2018-02-15T07:03:04Z MEMBER

@crusaderky In case adding this to the netCDF4 library doesn't work out:

I'm not sure I understood your latest comment - are you implying that to_hdf5 should internally use the h5netcdf module? I understand the rationale but it sounds a bit counter-intuitive to me?

Yes, I would suggest that to_hdf5() using h5netcdf, but with invalid_netcdf=True.

Also, to allow for non-zlib compression we need to either tap into the new h5netcdf API, or into h5py directly - so I'm afraid to_hdf5 can't be a simple wrapper around to_netcdf.

Yes, this is unfortunately true.

new method Dataset.to_hdf5 - starts as a copy-paste of to_netcdf, including the backend functions underneath

Yes

new unit tests, starting as a copy-paste of all unit tests for to_netcdf

Yes

change open_dataset and open_mfdataset: add new possible value for the engine field, "hdf5" if engine is None and file name terminates with .nc, use the current algorithm to choose default engine if engine is None and file name terminates with .h5, use h5py if engine is not None, ignore file extension

I think this is a little easier than that. h5netcdf will always be able to read invalid netCDF files, so we can just continue to use engine='h5netcdf'.

As for picking the default engine, see https://github.com/pydata/xarray/pull/1682, which is pretty close, though I need to think a little bit harder about the API to make sure it's right.

add to high level documentation and tutorials

Yes

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Better compression algorithms for NetCDF 253476466
326037069 https://github.com/pydata/xarray/issues/1536#issuecomment-326037069 https://api.github.com/repos/pydata/xarray/issues/1536 MDEyOklzc3VlQ29tbWVudDMyNjAzNzA2OQ== shoyer 1217238 2017-08-30T15:58:35Z 2017-08-30T15:58:35Z MEMBER

I just released new version of h5netcdf (0.4.0). It adds a invalid_netdf argument to the file constructors. So the right way to build this new backend (if we still want to go this way) would be to require h5netdf>=v0.4 and set invalid_netcdf=True when called from to_hdf5().

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Better compression algorithms for NetCDF 253476466
325712523 https://github.com/pydata/xarray/issues/1536#issuecomment-325712523 https://api.github.com/repos/pydata/xarray/issues/1536 MDEyOklzc3VlQ29tbWVudDMyNTcxMjUyMw== shoyer 1217238 2017-08-29T16:05:14Z 2017-08-29T16:05:14Z MEMBER

I'm adding a loud warning about this (will eventually be an error) to h5netcdf.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Better compression algorithms for NetCDF 253476466
325555913 https://github.com/pydata/xarray/issues/1536#issuecomment-325555913 https://api.github.com/repos/pydata/xarray/issues/1536 MDEyOklzc3VlQ29tbWVudDMyNTU1NTkxMw== shoyer 1217238 2017-08-29T05:02:49Z 2017-08-29T05:02:49Z MEMBER

Please, please, please don't write out "netCDF" files that don't conform to the spec.

Of course not. I understand the issue here.

I'll issue a fix for h5netcdf to disable this unless explicitly opted into, but we'll also need a fix for xarray to support the users who are currently using it to save data with complex values -- probably by adding a to_hdf5() method.

Here is the NetCDF-C issue I opened on reading these sorts of HDF5 enums: https://github.com/Unidata/netcdf-c/issues/267.

You're not actually saying the netCDF-c library should check for this custom format, are you?

No.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Better compression algorithms for NetCDF 253476466
325516877 https://github.com/pydata/xarray/issues/1536#issuecomment-325516877 https://api.github.com/repos/pydata/xarray/issues/1536 MDEyOklzc3VlQ29tbWVudDMyNTUxNjg3Nw== shoyer 1217238 2017-08-29T00:08:38Z 2017-08-29T00:08:38Z MEMBER

But these are still considered netCDF files, not HDF5 files? As in, they declare attributes that say "this is a netCDF file"?

Yes, I suppose so (and this should be fixed). h5netcdf currently writes the _NCProperties attribute to all files, though it uses a custom format that could be detected.

I hadn't really thought about this because the convention for marking HDF5 files as netCDF files is very recent and not actually enforced by any software (to my knowledge).

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Better compression algorithms for NetCDF 253476466
325514854 https://github.com/pydata/xarray/issues/1536#issuecomment-325514854 https://api.github.com/repos/pydata/xarray/issues/1536 MDEyOklzc3VlQ29tbWVudDMyNTUxNDg1NA== shoyer 1217238 2017-08-28T23:54:31Z 2017-08-28T23:54:31Z MEMBER

@dopplershift No, I don't think so. NetCDF-C only supports zlib compression (and doesn't support h5py's handling of complex variables, either, which use an HDF5 enumerated type).

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Better compression algorithms for NetCDF 253476466
325512111 https://github.com/pydata/xarray/issues/1536#issuecomment-325512111 https://api.github.com/repos/pydata/xarray/issues/1536 MDEyOklzc3VlQ29tbWVudDMyNTUxMjExMQ== shoyer 1217238 2017-08-28T23:35:42Z 2017-08-28T23:35:42Z MEMBER

h5netcdf already produces (slightly) incompatible netCDF files for some edge cases (e.g., complex numbers). This should probably be fixed, either by disabling these features or requiring an explicit opt-in, but nobody has gotten around to writing a fix yet (see https://github.com/shoyer/h5netcdf/issues/28).

In practice, many of our users seem to be pretty happy making use of these new features. LZF compression would just be another one.

I like @jhamman's idea of adding a dedicated to_hdf5() method that handles encoding with h5netcdf's new API. This would basically be a clone of to_netcdf(). In practice, I guess we would implement this with another engine for h5netcdf.

@petacube zstandard is great, but it's not in h5py yet! I think we'll need zarr for that (see https://github.com/pydata/xarray/pull/1528)

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Better compression algorithms for NetCDF 253476466

Advanced export

JSON shape: default, array, newline-delimited, object

CSV options:

CREATE TABLE [issue_comments] (
   [html_url] TEXT,
   [issue_url] TEXT,
   [id] INTEGER PRIMARY KEY,
   [node_id] TEXT,
   [user] INTEGER REFERENCES [users]([id]),
   [created_at] TEXT,
   [updated_at] TEXT,
   [author_association] TEXT,
   [body] TEXT,
   [reactions] TEXT,
   [performed_via_github_app] TEXT,
   [issue] INTEGER REFERENCES [issues]([id])
);
CREATE INDEX [idx_issue_comments_issue]
    ON [issue_comments] ([issue]);
CREATE INDEX [idx_issue_comments_user]
    ON [issue_comments] ([user]);
Powered by Datasette · Queries took 2317.872ms · About: xarray-datasette