home / github / issues

Menu
  • GraphQL API
  • Search all tables

issues: 1200581329

This data as json

id node_id number title user state locked assignee milestone comments created_at updated_at closed_at author_association active_lock_reason draft pull_request body reactions performed_via_github_app state_reason repo type
1200581329 PR_kwDOAMm_X842CG0x 6475 implement Zarr v3 spec support 6528957 closed 0     13 2022-04-11T21:52:37Z 2022-11-27T02:22:43Z 2022-11-27T02:22:43Z CONTRIBUTOR   0 pydata/xarray/pulls/6475

This is a WIP PR that is intended for use only with a development branch of Zarr (specifically https://github.com/zarr-developers/zarr-python/pull/1006). I am using it to test the Zarr v3 spec support that is currently being added to zarr-python.

The primary changes needed were: - The v3 spec requires a path be specified when calling open_group or open_consolidated. This PR currently just sets a default group name of 'xarray' if one is not specified via the group kwarg to ZarrStore.open_group. I think that is convenient, but one could instead be stricter and raise an error in this case. - If a string corresponding to a filesystem path or URL is used for store, then it is not possible to infer which version of the zarr spec is desired. In this case, the user must specify zarr_version to choose the zarr protocol version. The default of zarr_version=None will infer the version from a zarr BaseStore subclass when possible, otherwise defaulting to zarr_version=2 for backwards compatibility.

The good news is that these changes are quite small overall. Most changed lines in the tests involve optionally passing zarr_version around so that we could test v3 support both with an explicit DirectoryStoreV3 store as well as with string-based paths.

Other points that need consideration in regards to the spec - a number of the tested data types including unicode strings, byte strings, complex floats, datetime arrays and structured arrays which are not part of the core v3 spec. We currently do implement these for the v3 spec in zarr-python in the same way they worked for v2, but the implementation is subject to change based on decisions around v3 protocol extensions related to these dtypes. A very rough initial draft of such extensions is at https://github.com/zarr-developers/zarr-specs/pull/135. - dtype=str is used in some tests. Currently zarr-python uses a numcodecs filter VLenUTF8 in this case. The core zarr v3 spec no longer has a 'filter' entry as part of the metadata. A zarr v3 protocol extension needs to be defined to specify how this should be implemented. We do support this filter even for zarr v3 arrays currently, but it is done in a hacky way that needs to be standardized. This is the cause of the TODO comment around the call to attributes.pop('filters', None).

cc @joshmoore, @rabernat, @MSanKeys963

  • [ ] Closes #xxxx
  • [x] Tests added
  • [x] User visible changes (including notable bug fixes) are documented in whats-new.rst
  • [x] New functions/methods are listed in api.rst
{
    "url": "https://api.github.com/repos/pydata/xarray/issues/6475/reactions",
    "total_count": 5,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 4,
    "confused": 0,
    "heart": 0,
    "rocket": 1,
    "eyes": 0
}
    13221727 pull

Links from other tables

  • 5 rows from issues_id in issues_labels
  • 13 rows from issue in issue_comments
Powered by Datasette · Queries took 0.92ms · About: xarray-datasette