issue_comments: 325742232

This data as json

html_url	issue_url	id	node_id	user	created_at	updated_at	author_association	body	reactions	performed_via_github_app	issue
https://github.com/pydata/xarray/pull/1528#issuecomment-325742232	https://api.github.com/repos/pydata/xarray/issues/1528	325742232	MDEyOklzc3VlQ29tbWVudDMyNTc0MjIzMg==	1217238	2017-08-29T17:50:04Z	2017-08-29T17:50:04Z	MEMBER	If we think there is an advantage to using the zarr native filters, that could be added via a future PR once we have the basic backend working. The only advantage here would be for non-xarray users, who could use zarr to do this decoding/encoding automatically. For what it's worth, the implementation of scale offsets in xarray looks basically equivalent to what's done in zarr. I don't think there's a performance difference either way. A further rather big advantage in zarr that I'm not aware of in cdf/hdf (I may be wrong) is not just null values, but not having a given block be written to disc at all if it only contains null data. If you use chunks, I believe HDF5/NetCDF4 do the same thing, e.g., ``` In [10]: with h5py.File('one-chunk.h5') as f: f.create_dataset('foo', (100, 100), chunks=(100, 100)) In [11]: with h5py.File('many-chunk.h5') as f: f.create_dataset('foo', (100000, 100000), chunks=(100, 100)) In [12]: ls -l \| grep chunk.h5 -rw-r--r-- 1 shoyer eng 1400 Aug 29 10:48 many-chunk.h5 -rw-r--r-- 1 shoyer eng 1400 Aug 29 10:48 one-chunk.h5 ``` (Note the same file-size)	{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }		253136694