issues: 2128501296
This data as json
id | node_id | number | title | user | state | locked | assignee | milestone | comments | created_at | updated_at | closed_at | author_association | active_lock_reason | draft | pull_request | body | reactions | performed_via_github_app | state_reason | repo | type |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
2128501296 | I_kwDOAMm_X85-3low | 8733 | A basic default ChunkManager for arrays that report their own chunks | 90008 | open | 0 | 21 | 2024-02-10T14:36:55Z | 2024-03-10T17:26:13Z | CONTRIBUTOR | Is your feature request related to a problem?I'm creating duckarrays for various file backed datastructures for mine that are naturally "chunked". i.e. different parts of the array may appear in completely different files. Using these "chunks" and the "strides" algorithms can better decide on how to iterate in a convenient manner. For example, an MP4 file's chunks may be defined as being delimited by I frames, while images stored in a TIFF may be delimited by a page. So for me, chunks are not so useful for parallel computing, but more for computing locally and choosing the appropriate way to iterate through a large arrays (TB of uncompressed data). Describe the solution you'd likeI think a default Chunk manager could simply implement Advanced users could then go in an reimplement their own chunkmanager, but I was unable to use my duckarrays that incldued a Something as simple as: ```patch diff --git a/xarray/core/parallelcompat.py b/xarray/core/parallelcompat.py index c009ef48..bf500abb 100644 --- a/xarray/core/parallelcompat.py +++ b/xarray/core/parallelcompat.py @@ -681,3 +681,26 @@ class ChunkManagerEntrypoint(ABC, Generic[T_ChunkedArray]): cubed.store """ raise NotImplementedError() + + +class DefaultChunkManager(ChunkMangerEntrypoint): + def init(self) -> None: + self.array_cls = None + + def is_chunked_array(self, data: Any) -> bool: + return is_duck_array(data) and hasattr(data, "chunks") + + def chunks(self, data: T_ChunkedArray) -> T_NormalizedChunks: + return data.chunks + + def compute(self, data: T_ChunkedArray | Any, kwargs) -> tuple[np.ndarray, ...]: + raise tuple(np.asarray(d) for d in data) + + def normalize_chunks(self, args, kwargs): + raise NotImplementedError() + + def from_array(self, *args, kwargs): + raise NotImplementedError() + + def apply_gufunc(self, args, *kwargs): + raise NotImplementedError() ``` Describe alternatives you've consideredI created my own chunk manager, with my own chunk manager entry point. Kinda tedious... Additional contextIt seems that this is related to: https://github.com/pydata/xarray/pull/7019 |
{ "url": "https://api.github.com/repos/pydata/xarray/issues/8733/reactions", "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
13221727 | issue |