The datastructure that is more useful for this kind of analysis is the one that is an arbitrary graph of n-dimensional arrays; forcing the graph to have a hierarchical access allows optional organization; the graph itself can exist as python objects for nodes and references for edges. If the tree is not necessary/required everything can be placed on the first level, as it is done on a Dataset. # Example: ## Notation - `a:b` value `a` has type `b` - `t[...,n,...]` : type of data array of values of type `t`, with axis of length `n` - `D(n(,l))` dimension of size `n` with optional labels `l` - `A(t,*(dims:tuple[D])}` : type of data array of values of type `t`, with dimension `dims` - a tree node `T` is either: - a dict from hashables to tree nodes, `dict[Hashable,T]` - a dimension `D` - a data array `A` - `a[*tags]:=a[tag[0]][tag[1]]...[tag[len(tag)-1]]` - `map(f,*args:A,dims:tuple[D])` maps `f` over `args` broadcasting over `dims` Start with a 2d-dimensional DataArray: ``` d0 ( Graph : ( x->D(x_n,float[x_n]) y->D(y_n) v->A(float,x,y) ) Tree : ( { 'x':x, 'y':y, 'v':v, } ) ) ``` Map a function `f` that introduces a new dimension `w` with constant labels `f_w_l:int[f_w_n]` (through map_blocks or apply_ufunc) and add it to d0: ``` f : x:float->( Graph: f_w->D(f_w_n,f_w_l) a->A(float,f_w) b->A(float) Tree: { 'w':f_w, 'a':a, 'b':b, }) d1=d0.copy() d1['f']=map( f, d0['v'], (d0['x'],d0['y']) ) d1 ( Graph : x->D(x_n,float[x_n]) y->D(y_n) v->A(float,x,y) f_w->D(f_w_n,f_w_l) f_a->A(float,x,y,f_w) f_b->A(float,x,y) Tree : { 'x':x, 'y':y, 'v':v, 'f':{ 'w':f_w, 'a':f_a, 'b':f_b, } } ) ``` Map a function `g`, that has a dimension of the same name but different meaning and therefore possibly different length `g_w_n` and `g_w_l`: ``` g : x:float->( Graph: g_w->D(g_w_n,g_w_l) a->A(float,g_w) b->A(float) Tree: { 'w':g_w, 'a':a, 'b':b, }) d2=d1.copy() d2['g']=map( g, d1['v'], (d1['x'],d1['y']) ) d2 ( Graph : x->D(x_n,float[x_n]) y->D(y_n) v->A(float,x,y) f_w->D(f_w_n,f_w_l) f_a->A(float,x,y,f_w) f_b->A(float,x,y) g_w->D(g_w_n,g_w_l) g_a->A(float,x,y,g_w) g_b->A(float,x,y) Tree : { 'x':x, 'y':y, 'v':v, 'f':{ 'w':f_w, 'a':f_a, 'b':f_b, }, 'g':{ 'w':g_w, 'a':g_a, 'b':g_b, } } ) ``` Notice that both `f` and `g` output a dimension named 'w' but that they have different lengths and possibly different meanings. Suppose I now want to run analysis on f's and g's output, with a function that takes two a's and outputs a float Then d3 looks like: ``` h : a1:float,a2:float->( Graph: r->A(float) Tree: r d3=d2.copy() d3['f_g_aa']=map( h, d2['f','a'],d2['g','a'], (d2['x'],d2['y'],d2['f','w'],d2['g','w']) ) d3 { Graph : x->D(x_n,float[x_n]) y->D(y_n) v->A(float,x,y) f_w->D(f_w_n,f_w_l) f_a->A(float,x,y,f_w) f_b->A(float,x,y) g_w->D(g_w_n,g_w_l) g_a->A(float,x,y,g_w) g_b->A(float,x,y) f_g_aa->A(float,x,y,f_w,g_w) Tree : { 'x':x, 'y':y, 'v':v, 'f':{ 'w':f_w, 'a':f_a, 'b':f_b, }, 'g':{ 'w':g_w, 'a':g_a, 'b':g_b, } 'f_g_aa': f_g_aa } } ``` Compared to what I posted before, I dropped the resolving the dimension for a array by its position in the hierarchy since it would be innaplicable when a variable refers to dimensions in a different branch of the tree.