issues
2 rows where "created_at" is on date 2024-02-15, repo = 13221727 and user = 2448579 sorted by updated_at descending
This data as json, CSV (advanced)
Suggested facets: created_at (date), updated_at (date), closed_at (date)
| id | node_id | number | title | user | state | locked | assignee | milestone | comments | created_at | updated_at ▲ | closed_at | author_association | active_lock_reason | draft | pull_request | body | reactions | performed_via_github_app | state_reason | repo | type |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 2136709010 | I_kwDOAMm_X85_W5eS | 8753 | Lazy Loading with `DataArray` vs. `Variable` | dcherian 2448579 | closed | 0 | 0 | 2024-02-15T14:42:24Z | 2024-04-04T16:46:54Z | 2024-04-04T16:46:54Z | MEMBER | Discussed in https://github.com/pydata/xarray/discussions/8751
<sup>Originally posted by **ilan-gold** February 15, 2024</sup>
My goal is to get a dataset from [custom io-zarr backend lazy-loaded](https://docs.xarray.dev/en/stable/internals/how-to-add-new-backend.html#how-to-support-lazy-loading). But when I declare a `DataArray` based on the `Variable` which uses `LazilyIndexedArray`, everything is read in. Is this expected? I specifically don't want to have to use dask if possible. I have seen https://github.com/aurghs/xarray-backend-tutorial/blob/main/2.Backend_with_Lazy_Loading.ipynb but it's a little bit different.
While I have a custom backend array inheriting from `ZarrArrayWrapper`, this example using `ZarrArrayWrapper` directly still highlights the same unexpected behavior of everything being read in.
```python
import zarr
import xarray as xr
from tempfile import mkdtemp
import numpy as np
from pathlib import Path
from collections import defaultdict
class AccessTrackingStore(zarr.DirectoryStore):
def __init__(self, *args, **kwargs):
super().__init__(*args, **kwargs)
self._access_count = {}
self._accessed = defaultdict(set)
def __getitem__(self, key):
for tracked in self._access_count:
if tracked in key:
self._access_count[tracked] += 1
self._accessed[tracked].add(key)
return super().__getitem__(key)
def get_access_count(self, key):
return self._access_count[key]
def set_key_trackers(self, keys_to_track):
if isinstance(keys_to_track, str):
keys_to_track = [keys_to_track]
for k in keys_to_track:
self._access_count[k] = 0
def get_subkeys_accessed(self, key):
return self._accessed[key]
orig_path = Path(mkdtemp())
z = zarr.group(orig_path / "foo.zarr")
z['array'] = np.random.randn(1000, 1000)
store = AccessTrackingStore(orig_path / "foo.zarr")
store.set_key_trackers(['array'])
z = zarr.group(store)
arr = xr.backends.zarr.ZarrArrayWrapper(z['array'])
lazy_arr = xr.core.indexing.LazilyIndexedArray(arr)
# just `.zarray`
var = xr.Variable(('x', 'y'), lazy_arr)
print('Variable read in ', store.get_subkeys_accessed('array'))
# now everything is read in
da = xr.DataArray(var)
print('DataArray read in ', store.get_subkeys_accessed('array'))
``` |
{
"url": "https://api.github.com/repos/pydata/xarray/issues/8753/reactions",
"total_count": 0,
"+1": 0,
"-1": 0,
"laugh": 0,
"hooray": 0,
"confused": 0,
"heart": 0,
"rocket": 0,
"eyes": 0
} |
completed | xarray 13221727 | issue | ||||||
| 2136724736 | PR_kwDOAMm_X85m_MtN | 8754 | Don't access data when creating DataArray from Variable. | dcherian 2448579 | closed | 0 | 2 | 2024-02-15T14:48:32Z | 2024-04-04T16:46:54Z | 2024-04-04T16:46:53Z | MEMBER | 0 | pydata/xarray/pulls/8754 |
This seems to have been around since 2016-ish, so presumably our backend code path is passing arrays around, not Variables. cc @ilan-gold |
{
"url": "https://api.github.com/repos/pydata/xarray/issues/8754/reactions",
"total_count": 0,
"+1": 0,
"-1": 0,
"laugh": 0,
"hooray": 0,
"confused": 0,
"heart": 0,
"rocket": 0,
"eyes": 0
} |
xarray 13221727 | pull |
Advanced export
JSON shape: default, array, newline-delimited, object
CREATE TABLE [issues] (
[id] INTEGER PRIMARY KEY,
[node_id] TEXT,
[number] INTEGER,
[title] TEXT,
[user] INTEGER REFERENCES [users]([id]),
[state] TEXT,
[locked] INTEGER,
[assignee] INTEGER REFERENCES [users]([id]),
[milestone] INTEGER REFERENCES [milestones]([id]),
[comments] INTEGER,
[created_at] TEXT,
[updated_at] TEXT,
[closed_at] TEXT,
[author_association] TEXT,
[active_lock_reason] TEXT,
[draft] INTEGER,
[pull_request] TEXT,
[body] TEXT,
[reactions] TEXT,
[performed_via_github_app] TEXT,
[state_reason] TEXT,
[repo] INTEGER REFERENCES [repos]([id]),
[type] TEXT
);
CREATE INDEX [idx_issues_repo]
ON [issues] ([repo]);
CREATE INDEX [idx_issues_milestone]
ON [issues] ([milestone]);
CREATE INDEX [idx_issues_assignee]
ON [issues] ([assignee]);
CREATE INDEX [idx_issues_user]
ON [issues] ([user]);