home / github

Menu
  • GraphQL API
  • Search all tables

issues

Table actions
  • GraphQL API for issues

2 rows where "created_at" is on date 2024-02-15 and user = 2448579 sorted by updated_at descending

✎ View and edit SQL

This data as json, CSV (advanced)

Suggested facets: created_at (date), updated_at (date), closed_at (date)

type 2

  • issue 1
  • pull 1

state 1

  • closed 2

repo 1

  • xarray 2
id node_id number title user state locked assignee milestone comments created_at updated_at ▲ closed_at author_association active_lock_reason draft pull_request body reactions performed_via_github_app state_reason repo type
2136709010 I_kwDOAMm_X85_W5eS 8753 Lazy Loading with `DataArray` vs. `Variable` dcherian 2448579 closed 0     0 2024-02-15T14:42:24Z 2024-04-04T16:46:54Z 2024-04-04T16:46:54Z MEMBER      

Discussed in https://github.com/pydata/xarray/discussions/8751

<sup>Originally posted by **ilan-gold** February 15, 2024</sup> My goal is to get a dataset from [custom io-zarr backend lazy-loaded](https://docs.xarray.dev/en/stable/internals/how-to-add-new-backend.html#how-to-support-lazy-loading). But when I declare a `DataArray` based on the `Variable` which uses `LazilyIndexedArray`, everything is read in. Is this expected? I specifically don't want to have to use dask if possible. I have seen https://github.com/aurghs/xarray-backend-tutorial/blob/main/2.Backend_with_Lazy_Loading.ipynb but it's a little bit different. While I have a custom backend array inheriting from `ZarrArrayWrapper`, this example using `ZarrArrayWrapper` directly still highlights the same unexpected behavior of everything being read in. ```python import zarr import xarray as xr from tempfile import mkdtemp import numpy as np from pathlib import Path from collections import defaultdict class AccessTrackingStore(zarr.DirectoryStore): def __init__(self, *args, **kwargs): super().__init__(*args, **kwargs) self._access_count = {} self._accessed = defaultdict(set) def __getitem__(self, key): for tracked in self._access_count: if tracked in key: self._access_count[tracked] += 1 self._accessed[tracked].add(key) return super().__getitem__(key) def get_access_count(self, key): return self._access_count[key] def set_key_trackers(self, keys_to_track): if isinstance(keys_to_track, str): keys_to_track = [keys_to_track] for k in keys_to_track: self._access_count[k] = 0 def get_subkeys_accessed(self, key): return self._accessed[key] orig_path = Path(mkdtemp()) z = zarr.group(orig_path / "foo.zarr") z['array'] = np.random.randn(1000, 1000) store = AccessTrackingStore(orig_path / "foo.zarr") store.set_key_trackers(['array']) z = zarr.group(store) arr = xr.backends.zarr.ZarrArrayWrapper(z['array']) lazy_arr = xr.core.indexing.LazilyIndexedArray(arr) # just `.zarray` var = xr.Variable(('x', 'y'), lazy_arr) print('Variable read in ', store.get_subkeys_accessed('array')) # now everything is read in da = xr.DataArray(var) print('DataArray read in ', store.get_subkeys_accessed('array')) ```
{
    "url": "https://api.github.com/repos/pydata/xarray/issues/8753/reactions",
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  completed xarray 13221727 issue
2136724736 PR_kwDOAMm_X85m_MtN 8754 Don't access data when creating DataArray from Variable. dcherian 2448579 closed 0     2 2024-02-15T14:48:32Z 2024-04-04T16:46:54Z 2024-04-04T16:46:53Z MEMBER   0 pydata/xarray/pulls/8754
  • [x] Closes #8753

This seems to have been around since 2016-ish, so presumably our backend code path is passing arrays around, not Variables.

cc @ilan-gold

{
    "url": "https://api.github.com/repos/pydata/xarray/issues/8754/reactions",
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
    xarray 13221727 pull

Advanced export

JSON shape: default, array, newline-delimited, object

CSV options:

CREATE TABLE [issues] (
   [id] INTEGER PRIMARY KEY,
   [node_id] TEXT,
   [number] INTEGER,
   [title] TEXT,
   [user] INTEGER REFERENCES [users]([id]),
   [state] TEXT,
   [locked] INTEGER,
   [assignee] INTEGER REFERENCES [users]([id]),
   [milestone] INTEGER REFERENCES [milestones]([id]),
   [comments] INTEGER,
   [created_at] TEXT,
   [updated_at] TEXT,
   [closed_at] TEXT,
   [author_association] TEXT,
   [active_lock_reason] TEXT,
   [draft] INTEGER,
   [pull_request] TEXT,
   [body] TEXT,
   [reactions] TEXT,
   [performed_via_github_app] TEXT,
   [state_reason] TEXT,
   [repo] INTEGER REFERENCES [repos]([id]),
   [type] TEXT
);
CREATE INDEX [idx_issues_repo]
    ON [issues] ([repo]);
CREATE INDEX [idx_issues_milestone]
    ON [issues] ([milestone]);
CREATE INDEX [idx_issues_assignee]
    ON [issues] ([assignee]);
CREATE INDEX [idx_issues_user]
    ON [issues] ([user]);
Powered by Datasette · Queries took 43.317ms · About: xarray-datasette