home / github

Menu
  • GraphQL API
  • Search all tables

issue_comments

Table actions
  • GraphQL API for issue_comments

4 rows where issue = 102703065 sorted by updated_at descending

✎ View and edit SQL

This data as json, CSV (advanced)

Suggested facets: created_at (date), updated_at (date)

user 4

  • shoyer 1
  • jhamman 1
  • clarkfitzg 1
  • aidanheerdegen 1

author_association 2

  • MEMBER 3
  • CONTRIBUTOR 1

issue 1

  • Support for netcdf4/hdf5 compression · 4 ✖
id html_url issue_url node_id user created_at updated_at ▲ author_association body reactions performed_via_github_app issue
134420461 https://github.com/pydata/xarray/issues/548#issuecomment-134420461 https://api.github.com/repos/pydata/xarray/issues/548 MDEyOklzc3VlQ29tbWVudDEzNDQyMDQ2MQ== aidanheerdegen 6063709 2015-08-25T00:05:40Z 2015-08-25T04:09:18Z CONTRIBUTOR

Brilliant. Thanks. I looked into the code but thought the encoding information was being stripped out.

So I've confirmed xray will round-trip fine. Shallow copies also round trip. Similarly making a new dataset from a variable with encoding information preserves that information and will output properly.

``` python % tmp = xray.open_dataset('saved_on_disk_compressed.nc') % tmp <xray.Dataset> Dimensions: (time: 3, x: 2, y: 2) Coordinates: reference_time datetime64[ns] 2014-09-05 lon (x, y) float64 -99.83 -99.32 -99.79 -99.23 * time (time) datetime64[ns] 2014-09-06 2014-09-07 2014-09-08 lat (x, y) float64 42.25 42.21 42.63 42.59 * x (x) int64 0 1 * y (y) int64 0 1 Data variables: temperature (x, y, time) float64 10.66 8.539 6.713 8.519 29.07 27.86 ... precipitation (x, y, time) float64 0.3385 6.773 8.985 0.9651 0.1359 ...

% tmp.temperature.encoding {'chunksizes': (2, 2, 3), 'complevel': 5, 'contiguous': False, 'dtype': dtype('float64'), 'fletcher32': False, 'shuffle': True, 'source': 'saved_on_disk_compressed.nc', 'zlib': True}

% tmp2 = tmp % tmp2.to_netcdf('saved_on_disk_comp_tmp2.nc') % tmp3 = xray.open_dataset('saved_on_disk_comp_tmp2.nc') % tmp3.temperature.encoding {'chunksizes': (2, 2, 3), 'complevel': 5, 'contiguous': False, 'dtype': dtype('float64'), 'fletcher32': False, 'shuffle': True, 'source': 'saved_on_disk_comp_xray.nc', 'zlib': True} ```

Setting encoding dictionary works fine too (in this case copying from an existing variable):

python % tmp4 = xray.DataArray(tmp.temperature.values).to_dataset(name='temperature') % tmp4.temperature.encoding {} % tmp4.temperature.encoding = tmp.temperature.encoding % tmp4.temperature.encoding {'chunksizes': (2, 2, 3), 'complevel': 5, 'contiguous': False, 'dtype': dtype('float64'), 'fletcher32': False, 'shuffle': True, 'source': 'saved_on_disk_compressed.nc', 'zlib': True} % tmp4.to_netcdf('saved_on_disk_comp_tmp4.nc') % tmp5 = xray.open_dataset('saved_on_disk_comp_tmp4.nc') % tmp.temperature.encoding {'chunksizes': (2, 2, 3), 'complevel': 5, 'contiguous': False, 'dtype': dtype('float64'), 'fletcher32': False, 'shuffle': True, 'source': 'saved_on_disk_compressed.nc', 'zlib': True}

That will do nicely. Thanks.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Support for netcdf4/hdf5 compression 102703065
134279075 https://github.com/pydata/xarray/issues/548#issuecomment-134279075 https://api.github.com/repos/pydata/xarray/issues/548 MDEyOklzc3VlQ29tbWVudDEzNDI3OTA3NQ== shoyer 1217238 2015-08-24T16:18:00Z 2015-08-24T16:18:00Z MEMBER

This is actually already supported, though poorly documented (so it's basically unknown).

We seem to have some sort of bug in our documentation generation for recent versions, but in the v0.5.1 IO docs, you can see the encoding attribute at the end of the section on writing netCDFs: http://xray.readthedocs.org/en/v0.5.1/io.html#netcdf

The way this works is that encoding on each data array stores a dictionary of options that is used when serializing that array to disk. It support most of the options in netCDF4-python's createVariable method, including chunksizes, zlib, scale_factor, add_offset, _FillValue and dtype. This metadata is automatically filled in when reading a file from disk, which means that in principle xray should roundtrip the encoding faithfully.

Because encoding is read in when files are opened, invalid encoding options are currently ignored when saving a file to disk. This means that the current API is not very user friendly.

So I'd like to extend this into a keyword argument encoding for the to_netcdf method. The keyword argument would expect a dictionary where the keys are variable names and the values are encoding parameters, and errors would be raised for invalid encoding options. Here's my branch for that feature: https://github.com/shoyer/xray/tree/encoding-error-handling

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Support for netcdf4/hdf5 compression 102703065
134256961 https://github.com/pydata/xarray/issues/548#issuecomment-134256961 https://api.github.com/repos/pydata/xarray/issues/548 MDEyOklzc3VlQ29tbWVudDEzNDI1Njk2MQ== jhamman 2443309 2015-08-24T15:47:15Z 2015-08-24T15:47:15Z MEMBER

I don't see any reason why we couldn't support this. The difficulty is that the implementation will be different (or not possible) for difficult backends.

netCDF4 adds compression at the Variable level so we would have to think about how to implement this to our Dataset.to_netcdf method. Would we end up setting the compression level / type in each DataArray or would we add an argument to the to_netcdf method?

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Support for netcdf4/hdf5 compression 102703065
134220841 https://github.com/pydata/xarray/issues/548#issuecomment-134220841 https://api.github.com/repos/pydata/xarray/issues/548 MDEyOklzc3VlQ29tbWVudDEzNDIyMDg0MQ== clarkfitzg 5356122 2015-08-24T14:16:32Z 2015-08-24T14:16:32Z MEMBER

This seems useful. xray uses the netCDF4 library here, and they support it. In the meantime, you could always add a post processing step from the command line: http://www.unidata.ucar.edu/blogs/developer/en/entry/netcdf_compression.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Support for netcdf4/hdf5 compression 102703065

Advanced export

JSON shape: default, array, newline-delimited, object

CSV options:

CREATE TABLE [issue_comments] (
   [html_url] TEXT,
   [issue_url] TEXT,
   [id] INTEGER PRIMARY KEY,
   [node_id] TEXT,
   [user] INTEGER REFERENCES [users]([id]),
   [created_at] TEXT,
   [updated_at] TEXT,
   [author_association] TEXT,
   [body] TEXT,
   [reactions] TEXT,
   [performed_via_github_app] TEXT,
   [issue] INTEGER REFERENCES [issues]([id])
);
CREATE INDEX [idx_issue_comments_issue]
    ON [issue_comments] ([issue]);
CREATE INDEX [idx_issue_comments_user]
    ON [issue_comments] ([user]);
Powered by Datasette · Queries took 13.704ms · About: xarray-datasette