issues: 2247043809
This data as json
id | node_id | number | title | user | state | locked | assignee | milestone | comments | created_at | updated_at | closed_at | author_association | active_lock_reason | draft | pull_request | body | reactions | performed_via_github_app | state_reason | repo | type |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
2247043809 | I_kwDOAMm_X86F7yrh | 8949 | Mapping DataTree methods over nodes with variables for which the args are invalid | 35968931 | open | 0 | 0 | 2024-04-16T23:45:26Z | 2024-04-17T14:58:14Z | MEMBER | What is your issue?In the datatree call today we narrowed down an issue with how datatree maps methods over many variables in many nodes. This issue is essentially https://github.com/xarray-contrib/datatree/issues/67, but I'll attempt to discuss the problem and solution in more general terms. Context in xarray
There is therefore a difference between
For example: ```python In [13]: ds = xr.Dataset({'a': ('x', [1, 2]), 'b': 0}) In [14]: ds.isel(x=0) Out[14]: <xarray.Dataset> Size: 16B Dimensions: () Data variables: a int64 8B 1 b int64 8B 0 In [15]: ds.map(Variable.isel, x=0)ValueError Traceback (most recent call last) Cell In[15], line 1 ----> 1 ds.map(Variable.isel, x=0) ... ValueError: Dimensions {'x'} do not exist. Expected one or more of () ``` (Aside: It would be nice for Clearly Issue in DataTreeIn datatree we have to map methods over different variables in the same node, but also over different variables in different nodes. Currently the implementation of a method naively maps the This causes problems for users, for example in https://github.com/xarray-contrib/datatree/issues/67. A minimal example of this problem would be ```python In [18]: ds1 = xr.Dataset({'a': ('x', [1, 2])}) In [19]: ds2 = xr.Dataset({'b': 0}) In [20]: dt = DataTree.from_dict({'node1': ds1, 'node2': ds2}) In [21]: dt Out[21]: DataTree('None', parent=None) ├── DataTree('node1') │ Dimensions: (x: 2) │ Dimensions without coordinates: x │ Data variables: │ a (x) int64 16B 1 2 └── DataTree('node2') Dimensions: () Data variables: b int64 8B 0 In [22]: dt.isel(x=0)
(The slightly weird error message here is related to the deprecation cycle in #8500) We would have preferred that variable Desired behaviourWe can kind of think of the desired behaviour like a hypothesis property we want (xref https://github.com/pydata/xarray/issues/1846), but not quite. It would be something like
except that Proposed SolutionThere are two ways I can imagine implementing this.
1) Use I think @shoyer and I concluded that we should make (2), in the form of some kind of new primitive, i.e. ```python class DataTree: def reduce(self, reduce_func: Callable, dim: Dims = None, , *kwargs) -> DataTree: all_dims_in_tree = set(node.dims for node in self.subtree)
``` Then every method that has this pattern of acting over one or more dims should be mapped over the tree using cc @shoyer, @flamingbear, @owenlittlejohns |
{ "url": "https://api.github.com/repos/pydata/xarray/issues/8949/reactions", "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
13221727 | issue |