home / github / issue_comments

Menu
  • GraphQL API
  • Search all tables

issue_comments: 325742232

This data as json

html_url issue_url id node_id user created_at updated_at author_association body reactions performed_via_github_app issue
https://github.com/pydata/xarray/pull/1528#issuecomment-325742232 https://api.github.com/repos/pydata/xarray/issues/1528 325742232 MDEyOklzc3VlQ29tbWVudDMyNTc0MjIzMg== 1217238 2017-08-29T17:50:04Z 2017-08-29T17:50:04Z MEMBER

If we think there is an advantage to using the zarr native filters, that could be added via a future PR once we have the basic backend working.

The only advantage here would be for non-xarray users, who could use zarr to do this decoding/encoding automatically.

For what it's worth, the implementation of scale offsets in xarray looks basically equivalent to what's done in zarr. I don't think there's a performance difference either way.

A further rather big advantage in zarr that I'm not aware of in cdf/hdf (I may be wrong) is not just null values, but not having a given block be written to disc at all if it only contains null data.

If you use chunks, I believe HDF5/NetCDF4 do the same thing, e.g., ``` In [10]: with h5py.File('one-chunk.h5') as f: f.create_dataset('foo', (100, 100), chunks=(100, 100))

In [11]: with h5py.File('many-chunk.h5') as f: f.create_dataset('foo', (100000, 100000), chunks=(100, 100))

In [12]: ls -l | grep chunk.h5 -rw-r--r-- 1 shoyer eng 1400 Aug 29 10:48 many-chunk.h5 -rw-r--r-- 1 shoyer eng 1400 Aug 29 10:48 one-chunk.h5 ``` (Note the same file-size)

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  253136694
Powered by Datasette · Queries took 0.483ms · About: xarray-datasette