home / github / issues

Menu
  • GraphQL API
  • Search all tables

issues: 1503046820

This data as json

id node_id number title user state locked assignee milestone comments created_at updated_at closed_at author_association active_lock_reason draft pull_request body reactions performed_via_github_app state_reason repo type
1503046820 I_kwDOAMm_X85Zlqyk 7388 Xarray does not support full range of netcdf-python compression options 1197350 closed 0     22 2022-12-19T14:21:17Z 2023-12-21T15:43:06Z 2023-12-21T15:24:17Z MEMBER      

What is your issue?

Summary

The netcdf4-python API docs say the following

If the optional keyword argument compression is set, the data will be compressed in the netCDF file using the specified compression algorithm. Currently zlib,szip,zstd,bzip2,blosc_lz,blosc_lz4,blosc_lz4hc, blosc_zlib and blosc_zstd are supported. Default is None (no compression). All of the compressors except zlib and szip use the HDF5 plugin architecture.

If the optional keyword zlib is True, the data will be compressed in the netCDF file using zlib compression (default False). The use of this option is deprecated in favor of compression='zlib'.

Although compression is considered a valid encoding option by Xarray

https://github.com/pydata/xarray/blob/bbe63ab657e9cb16a7cbbf6338a8606676ddd7b0/xarray/backends/netCDF4_.py#L232-L242

...it appears that we silently ignores the compression option when creating new netCDF4 variables:

https://github.com/pydata/xarray/blob/bbe63ab657e9cb16a7cbbf6338a8606676ddd7b0/xarray/backends/netCDF4_.py#L488-L501

Code example

```python shape = (10, 20) chunksizes = (1, 10)

encoding = { 'compression': 'zlib', 'shuffle': True, 'complevel': 8, 'fletcher32': False, 'contiguous': False, 'chunksizes': chunksizes }

da = xr.DataArray( data=np.random.rand(*shape), dims=['y', 'x'], name="foo", attrs={"bar": "baz"} ) da.encoding = encoding ds = da.to_dataset()

fname = "test.nc" ds.to_netcdf(fname, engine="netcdf4", mode="w")

with xr.open_dataset(fname, engine="netcdf4") as ds1: display(ds1.foo.encoding) ```

{'zlib': False, 'szip': False, 'zstd': False, 'bzip2': False, 'blosc': False, 'shuffle': False, 'complevel': 0, 'fletcher32': False, 'contiguous': False, 'chunksizes': (1, 10), 'source': 'test.nc', 'original_shape': (10, 20), 'dtype': dtype('float64'), '_FillValue': nan}

In addition to showing that compression is ignored, this also reveals several other encoding options that are not available when writing data from xarray (szip, zstd, bzip2, blosc).

Proposal

We should align with the recommendation from the netcdf4 docs and support compression= style encoding in NetCDF. We should deprecate zlib=True syntax.

{
    "url": "https://api.github.com/repos/pydata/xarray/issues/7388/reactions",
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  completed 13221727 issue

Links from other tables

  • 2 rows from issues_id in issues_labels
  • 11 rows from issue in issue_comments
Powered by Datasette · Queries took 0.71ms · About: xarray-datasette