home / github / issues

Menu
  • Search all tables
  • GraphQL API

issues: 1935984485

This data as json

id node_id number title user state locked assignee milestone comments created_at updated_at closed_at author_association active_lock_reason draft pull_request body reactions performed_via_github_app state_reason repo type
1935984485 I_kwDOAMm_X85zZMdl 8290 Potential performance optimization for Zarr backend 1197350 closed 0     0 2023-10-10T18:41:19Z 2023-10-13T16:38:58Z 2023-10-13T16:38:58Z MEMBER      

What is your issue?

We have identified an inefficiency in the way the ZarrArrayWrapper works. This class currently stores a reference to a ZarrStore and a variable name

https://github.com/pydata/xarray/blob/75af56c33a29529269a73bdd00df2d3af17ee0f5/xarray/backends/zarr.py#L63-L68

When accessing the array, the parent group of the array is read and used to open a new Zarr array.

https://github.com/pydata/xarray/blob/75af56c33a29529269a73bdd00df2d3af17ee0f5/xarray/backends/zarr.py#L83-L84

This is a relatively metadata-intensive operation for Zarr. It requires reading both the group metadata and the array metadata. Because of how this wrapper works, these operations currently happen every time data is read from the array. If we have a dask array wrapping the zarr array with thousands of chunks, these metadata operations will happen within every single task. For high latency stores, this is really bad.

Instead, we should just reference the zarr.Array object directly within the ZarrArrayWrapper. It's lightweight and easily serializable. There is no need to re-open the array each time we want to read data from it. This change will lead to an immediate performance enhancement in all Zarr operations.

{
    "url": "https://api.github.com/repos/pydata/xarray/issues/8290/reactions",
    "total_count": 6,
    "+1": 4,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 2,
    "eyes": 0
}
  completed 13221727 issue

Links from other tables

  • 3 rows from issues_id in issues_labels
  • 0 rows from issue in issue_comments
Powered by Datasette · Queries took 0.626ms · About: xarray-datasette