home / github / issue_comments

Menu
  • Search all tables
  • GraphQL API

issue_comments: 980840643

This data as json

html_url issue_url id node_id user created_at updated_at author_association body reactions performed_via_github_app issue
https://github.com/pydata/xarray/issues/6033#issuecomment-980840643 https://api.github.com/repos/pydata/xarray/issues/6033 980840643 IC_kwDOAMm_X846dnDD 5509356 2021-11-28T04:57:48Z 2021-11-28T04:57:48Z NONE

@max-sixty Okay, yeah, that's the problem, it's re-downloading the data every time the values are accessed. Apparently this is the default behavior given that zarr is a chunked format.

Adding cache=True: - Fixes the problem in open_dataset - Throws an error in open_zarr - Doesn't have any noticeable effect in open_mfdataset

My data archive can't normally be usefully read without open_mfdataset and it's small enough to easily fit in memory so this behavior isn't ideal.

I guess I had assumed that the data would get stored on disk temporarily even if it wasn't in memory, too, so it's an unexpected limitation that the choices are to either cache it in memory or re-read from S3 every time you access the data. It also seems odd that the default caching logic just takes into account whether the data is chunked, not how big (small) it is, how slow accessing the store is, or whether the data's being repeatedly accessed.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  1064837571
Powered by Datasette · Queries took 1.704ms · About: xarray-datasette