home / github / issue_comments

Menu
  • Search all tables
  • GraphQL API

issue_comments: 965544051

This data as json

html_url issue_url id node_id user created_at updated_at author_association body reactions performed_via_github_app issue
https://github.com/pydata/xarray/pull/5961#issuecomment-965544051 https://api.github.com/repos/pydata/xarray/issues/5961 965544051 IC_kwDOAMm_X845jQhz 35968931 2021-11-10T16:57:08Z 2021-11-10T17:06:13Z MEMBER

From a xarray.Dataset perspective, Dataset._variables just needs to be a MutableMapping of xarray.Variable objects. And in most cases (when not using DataTree), _variables would still be a plain dictionary, which means adding DataTree support would have no performance implications for normal Dataset objects.

That sounds nice, and might not require any changes to Dataset at all!

My tentative suggestion would be to use a mixed dictionary with either xarray.Variable or nested dictionaries as entries for the data in DataTree.

I think it's a lot easier to have a dict of DataTree objects rather than a nested dict of data, as then each node just points to its child nodes instead of having a node which knows about all the data in the whole tree (if that's what you meant).

How about making custom Mapping for use as Dataset._variables directly, which directly is a mapping of dataset variables?

So this is my understanding of what you're suggesting - I'm just not sure if it solves all the requirements:

```python class DataManifest(MutableMapping): """ Acts like a dict of keys to variables, but prevents setting variables to same key as any children """ def init(self, variables={}, children={}): # check for collisions here self._variables = {} self._children = {}

def __getitem__(self, key):
    # only expose the variables so this acts like a normal dict of variables
    return self._variables[key]

def __setitem__(self, key, var):
    if key in self._children:
        raise KeyError(
          "key already in use to denote a child" 
          "node in wrapping DataTree node"
        )
    self.__dict__[key] = var

class Dataset: self._variables = Mapping[Any, Variable] # in standard case just use dict of vars as before

# Use ._construct_direct as the constructor
# as it allows for setting ._variables directly

# therefore no changes to Dataset required!

class DataTree: def init(self, name, data, parent, children): self._children self._variables self._coord_names self._dims ...

@property
def ds(self):
    manifest = DataManifest(variables, children)
    return Dataset._from_treenode(
      variables=manifest,
      coord_names=self._coord_names,
      dims=self._dims,
      ...
    )

@ds.setter
def ds(self, ds):
    # check for collisions between ds.data_vars and self.children
    ...

ds = Dataset({'a': 0}) subtree1 = Datatree('group1') dt = Datatree('root', data=ds, children=[subtree])

wrapped_ds = dt.ds wrapped_ds['group1'] = 1 # raises KeyError - good!

subtree2 = Datatree('b') dt.ds['b'] = 2 # this will happily add a variable to the dataset dt.add_child(subtree2) # want to ensure this raises a KeyError as it conflicts with the new variable, but with this design I'm not sure if it will... ```

EDIT: Actually maybe this would work? So long as in DataTree we have python class DataTree: self._variables = manifest self._children = manifest.children Then adding a new child node would also update the manifest, meaning that the linked dataset should know about it too...

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  1048697792
Powered by Datasette · Queries took 0.677ms · About: xarray-datasette