home / github

Menu
  • GraphQL API
  • Search all tables

issue_comments

Table actions
  • GraphQL API for issue_comments

5 rows where issue = 753852119 sorted by updated_at descending

✎ View and edit SQL

This data as json, CSV (advanced)

Suggested facets: reactions, created_at (date), updated_at (date)

user 5

  • rabernat 1
  • shoyer 1
  • nbren12 1
  • dcherian 1
  • LunarLanding 1

author_association 3

  • MEMBER 3
  • CONTRIBUTOR 1
  • NONE 1

issue 1

  • Lazy concatenation of arrays · 5 ✖
id html_url issue_url node_id user created_at updated_at ▲ author_association body reactions performed_via_github_app issue
1122649316 https://github.com/pydata/xarray/issues/4628#issuecomment-1122649316 https://api.github.com/repos/pydata/xarray/issues/4628 IC_kwDOAMm_X85C6kTk rabernat 1197350 2022-05-10T17:00:47Z 2022-05-10T17:02:34Z MEMBER

Any pointers regarding where to start / modules involved to implement this? I would like to have a try.

The starting point would be to look at the code in indexing.py and try to understand how lazy indexing works.

In particular, look at

https://github.com/pydata/xarray/blob/3920c48d61d1f213a849bae51faa473b9c471946/xarray/core/indexing.py#L465-L470

Then you may want to try writing a class that looks like

```python class LazilyConcatenatedArray: # have to decide what to inherit from

def __init__(self, *arrays: LazilyIndexedArray, concat_axis=0):
    # figure out what you need to keep track of

@property
def shape(self):
    # figure out how to determine the total shape

def __getitem__(self, indexer) -> LazilyIndexedArray:
    # figure out how to map an indexer to the right piece of data

```

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Lazy concatenation of arrays 753852119
1122601160 https://github.com/pydata/xarray/issues/4628#issuecomment-1122601160 https://api.github.com/repos/pydata/xarray/issues/4628 IC_kwDOAMm_X85C6YjI nbren12 1386642 2022-05-10T16:11:14Z 2022-05-10T16:11:14Z CONTRIBUTOR

@rabernat It seems that great minds think alike ;)

{
    "total_count": 2,
    "+1": 0,
    "-1": 0,
    "laugh": 2,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Lazy concatenation of arrays 753852119
1122558718 https://github.com/pydata/xarray/issues/4628#issuecomment-1122558718 https://api.github.com/repos/pydata/xarray/issues/4628 IC_kwDOAMm_X85C6OL- dcherian 2448579 2022-05-10T15:39:27Z 2022-05-10T15:39:27Z MEMBER

From @rabernat in #6588:

Right now, if I want to concatenate multiple datasets (e.g. as in open_mfdataset), I have two options: - Eagerly load the data as numpy arrays ➡️ xarray will dispatch to np.concatenate - Chunk each dataset ➡️ xarray will dispatch to dask.array.concatenate

In pseudocode:

``` ds1 = xr.open_dataset("some_big_lazy_source_1.nc") ds2 = xr.open_dataset("some_big_lazy_source_2.nc") item1 = ds1.foo[0, 0, 0] # lazily access a single item ds = xr.concat([ds1.chunk(), ds2.chunk()], "time") # only way to lazily concat

trying to access the same item will now trigger loading of all of ds1

item1 = ds.foo[0, 0, 0]

yes I could use different chunks, but the point is that I should not have to

arbitrarily choose chunks to make this work

```

However, I am increasingly encountering scenarios where I would like to lazily concatenate datasets (without loading into memory), but also without the requirement of using dask. This would be useful, for example, for creating composite datasets that point back to an OpenDAP server, preserving the possibility of granular lazy access to any array element without the requirement of arbitrary chunking at an intermediate stage.

Describe the solution you'd like

I propose to extend our LazilyIndexedArray classes to support simple concatenation and stacking. The result of applying concat to such arrays will be a new LazilyIndexedArray that wraps the underlying arrays into a single object.

The main difficulty in implementing this will probably be with indexing: the concatenated array will need to understand how to map global indexes to the underling individual array indexes. That is a little tricky but eminently solvable.

Describe alternatives you've considered

The alternative is to structure your code in a way that avoids needing to lazily concatenate arrays. That is what we do now. It is not optimal.

{
    "total_count": 2,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 2,
    "rocket": 0,
    "eyes": 0
}
  Lazy concatenation of arrays 753852119
979412822 https://github.com/pydata/xarray/issues/4628#issuecomment-979412822 https://api.github.com/repos/pydata/xarray/issues/4628 IC_kwDOAMm_X846YKdW LunarLanding 4441338 2021-11-25T18:23:28Z 2021-11-25T18:23:28Z NONE

Any pointers regarding where to start / modules involved to implement this? I would like to have a try.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Lazy concatenation of arrays 753852119
847185858 https://github.com/pydata/xarray/issues/4628#issuecomment-847185858 https://api.github.com/repos/pydata/xarray/issues/4628 MDEyOklzc3VlQ29tbWVudDg0NzE4NTg1OA== shoyer 1217238 2021-05-24T16:44:34Z 2021-05-24T16:44:34Z MEMBER

If you write write something like xarray.concat(..., data_vars='minimal', coords='minimal'), dask should entirely lazy -- the non-laziness only happens with the default value of coords='different'.

But I agree, it would be nice if Xarray's internal lazy indexing machinery supported concatenation. It currently does not.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Lazy concatenation of arrays 753852119

Advanced export

JSON shape: default, array, newline-delimited, object

CSV options:

CREATE TABLE [issue_comments] (
   [html_url] TEXT,
   [issue_url] TEXT,
   [id] INTEGER PRIMARY KEY,
   [node_id] TEXT,
   [user] INTEGER REFERENCES [users]([id]),
   [created_at] TEXT,
   [updated_at] TEXT,
   [author_association] TEXT,
   [body] TEXT,
   [reactions] TEXT,
   [performed_via_github_app] TEXT,
   [issue] INTEGER REFERENCES [issues]([id])
);
CREATE INDEX [idx_issue_comments_issue]
    ON [issue_comments] ([issue]);
CREATE INDEX [idx_issue_comments_user]
    ON [issue_comments] ([user]);
Powered by Datasette · Queries took 13.389ms · About: xarray-datasette