home / github / issues

Menu
  • Search all tables
  • GraphQL API

issues: 1231184996

This data as json

id node_id number title user state locked assignee milestone comments created_at updated_at closed_at author_association active_lock_reason draft pull_request body reactions performed_via_github_app state_reason repo type
1231184996 I_kwDOAMm_X85JYmRk 6588 Support lazy concatenation *without dask* 1197350 closed 0     2 2022-05-10T13:40:20Z 2023-03-10T18:40:22Z 2022-05-10T15:38:20Z MEMBER      

Is your feature request related to a problem?

Right now, if I want to concatenate multiple datasets (e.g. as in open_mfdataset), I have two options: - Eagerly load the data as numpy arrays ➡️ xarray will dispatch to np.concatenate - Chunk each dataset ➡️ xarray will dispatch to dask.array.concatenate

In pseudocode: ```python ds1 = xr.open_dataset("some_big_lazy_source_1.nc") ds2 = xr.open_dataset("some_big_lazy_source_2.nc") item1 = ds1.foo[0, 0, 0] # lazily access a single item ds = xr.concat([ds1.chunk(), ds2.chunk()], "time") # only way to lazily concat

trying to access the same item will now trigger loading of all of ds1

item1 = ds.foo[0, 0, 0]

yes I could use different chunks, but the point is that I should not have to

arbitrarily choose chunks to make this work

```

However, I am increasingly encountering scenarios where I would like to lazily concatenate datasets (without loading into memory), but also without the requirement of using dask. This would be useful, for example, for creating composite datasets that point back to an OpenDAP server, preserving the possibility of granular lazy access to any array element without the requirement of arbitrary chunking at an intermediate stage.

Describe the solution you'd like

I propose to extend our LazilyIndexedArray classes to support simple concatenation and stacking. The result of applying concat to such arrays will be a new LazilyIndexedArray that wraps the underlying arrays into a single object.

The main difficulty in implementing this will probably be with indexing: the concatenated array will need to understand how to map global indexes to the underling individual array indexes. That is a little tricky but eminently solvable.

Describe alternatives you've considered

The alternative is to structure your code in a way that avoids needing to lazily concatenate arrays. That is what we do now. It is not optimal.

Additional context

No response

{
    "url": "https://api.github.com/repos/pydata/xarray/issues/6588/reactions",
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  completed 13221727 issue

Links from other tables

  • 3 rows from issues_id in issues_labels
  • 2 rows from issue in issue_comments
Powered by Datasette · Queries took 2.387ms · About: xarray-datasette