home / github / issues

Menu
  • GraphQL API
  • Search all tables

issues: 874292512

This data as json

id node_id number title user state locked assignee milestone comments created_at updated_at closed_at author_association active_lock_reason draft pull_request body reactions performed_via_github_app state_reason repo type
874292512 MDU6SXNzdWU4NzQyOTI1MTI= 5251 Switch default for Zarr reading/writing to consolidated=True? 1217238 closed 0     4 2021-05-03T06:59:42Z 2021-08-30T15:21:11Z 2021-08-30T15:21:11Z MEMBER      

Consolidated metadata was a new feature in Zarr v2.3, which was released over two year ago (March 22, 2019).

Since then, I have used consolidated=True every time I've written or opened a Zarr store. As far as I can tell, this is almost always a good idea: - With local storage, it usually doesn't really matter. You spend a bit of time writing the consolidated metadata and have one extra file on disk, but the overhead is typically negligible. - With Cloud object stores or network filesystems, it can matter quite a large amount. Without consolidated metadata, these systems can be unusably slow for opening datasets. Cloud storage is of course the main use-case for Zarr. If you're using a local disk, you might as well stick with single files such as netCDF.

I wonder if consolidated metadata is mature enough now that we could consider switching the default behavior in Xarray. From my perspective, this is a big "gotcha" for getting good performance with Zarr. More than one of my colleagues has been unimpressed with the performance of Zarr until they learned to set consolidated=True.

I would suggest doing this in way is almost entirely backwards compatible, with only a minor performance costs for reading non-consolidated datasets: - to_zarr() switches the default to consolidated=True. The consolidate_metadata() will thus happen by default. - open_zarr() switches the default to consolidated=None, which means "Try reading consolidated metadata, and fall-back to non-consolidated if that fails." This will be slightly slower for non-consolidated metadata due to the extra file-lookup, but given that opening with non-consolidated metadata already requires a moderately large number of file look-ups, I doubt anyone will notice the difference.

CC @rabernat

{
    "url": "https://api.github.com/repos/pydata/xarray/issues/5251/reactions",
    "total_count": 11,
    "+1": 11,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  completed 13221727 issue

Links from other tables

  • 1 row from issues_id in issues_labels
  • 4 rows from issue in issue_comments
Powered by Datasette · Queries took 0.661ms · About: xarray-datasette