home / github

Menu
  • Search all tables
  • GraphQL API

issue_comments

Table actions
  • GraphQL API for issue_comments

6 rows where issue = 1048697792 sorted by updated_at descending

✎ View and edit SQL

This data as json, CSV (advanced)

Suggested facets: created_at (date), updated_at (date)

user 2

  • TomNicholas 4
  • shoyer 2

issue 1

  • [Experimental] Refactor Dataset to store variables in a manifest · 6 ✖

author_association 1

  • MEMBER 6
id html_url issue_url node_id user created_at updated_at ▲ author_association body reactions performed_via_github_app issue
965803186 https://github.com/pydata/xarray/pull/5961#issuecomment-965803186 https://api.github.com/repos/pydata/xarray/issues/5961 IC_kwDOAMm_X845kPyy TomNicholas 35968931 2021-11-10T22:26:12Z 2021-11-11T01:08:14Z MEMBER

Update: I tried making a custom mapping class (code in drop-down below), then swapping out ._variables = dict(variables) for ._variables = DataManifest(variables=variables) in the Dataset constructors to see if that would break anything, as a first step towards that kind of integration. (At some point it would be good to be able to automatically run the test_dataset.py tests again for the manifest case.)

It kind of works?

From a xarray.Dataset perspective, Dataset._variables just needs to be a MutableMapping of xarray.Variable objects.

It's not quite as simple as this - you need a .copy method (fine), a repr (okay), and there are several places inside Dataset and DataArray that explicitly check that the type of ._variables is a dict.

To get tests to pass I can either relax those type constraints (which leads to >2/3 of test_dataset.py passing immediately) or maybe try making DataManifest inherit from dict so that it passes isinstance(ds._variables, dict)? This probably deserves a new PR...

(EDIT: Though maybe inheriting from dict is more trouble than it's worth)

Code for custom mapping class ```python from collections.abc import MutableMapping from typing import Dict, Hashable, Mapping, Iterator, Sequence from xarray.core.variable import Variable #from xarray.tree.datatree import DataTree class DataTree: """Purely for type hinting purposes for now (and to avoid a circular import)""" ... class DataManifest(MutableMapping): """ Stores variables like a dict, but also stores children alongside in a hidden manner, to check against. Acts like a dict of keys to variables, but prevents setting variables to same key as any children. It prevents name collisions by acting as a common record of stored items for both the DataTree instance and its wrapped Dataset instance. """ def __init__( self, variables: Dict[Hashable, Variable] = {}, children: Dict[Hashable, DataTree] = {}, ): if variables and children: keys_in_both = set(variables.keys()) & set(children.keys()) if keys_in_both: raise KeyError( f"The keys {keys_in_both} exist in both the variables and child nodes" ) self._variables = variables self._children = children @property def children(self) -> Dict[Hashable, DataTree]: """Stores list of the node's children""" return self._children @children.setter def children(self, children: Dict[Hashable, DataTree]): for key, child in children.items(): if key in self.keys(): raise KeyError("Cannot add child under key {key} because a variable is already stored under that key") if not isinstance(child, DataTree): raise TypeError self._children = children def __getitem__(self, key: Hashable) -> Variable: """Forward to the variables here so the manifest acts like a normal dict of variables""" return self._variables[key] def __setitem__(self, key: Hashable, value: Variable): """Allow adding new variables, but first check if they conflict with children""" if key in self._children: raise KeyError( f"key {key} already in use to denote a child" "node in wrapping DataTree node" ) if isinstance(value, Variable): self._variables[key] = value else: raise TypeError(f"Cannot store object of type {type(value)}") def __delitem__(self, key: Hashable): """Forward to the variables here so the manifest acts like a normal dict of variables""" if key in self._variables: del self._variables[key] elif key in self.children: # TODO might be better not to del children here? del self._children[key] else: raise KeyError(f"Cannot remove item because nothing is stored under {key}") def __contains__(self, item: object) -> bool: """Forward to the variables here so the manifest acts like a normal dict of variables""" return item in self._variables def __iter__(self) -> Iterator: """Forward to the variables here so the manifest acts like a normal dict of variables""" return iter(self._variables) def __len__(self) -> int: """Forward to the variables here so the manifest acts like a normal dict of variables""" return len(self._variables) def copy(self) -> "DataManifest": """Required for consistency with dict""" return DataManifest(variables=self._variables.copy(), children=self._children.copy()) # TODO __repr__ ```
{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  [Experimental] Refactor Dataset to store variables in a manifest 1048697792
965861064 https://github.com/pydata/xarray/pull/5961#issuecomment-965861064 https://api.github.com/repos/pydata/xarray/issues/5961 IC_kwDOAMm_X845kd7I TomNicholas 35968931 2021-11-11T00:08:44Z 2021-11-11T00:08:44Z MEMBER

Question: Does this change to ._variables need to propagate down to DatasetCoordinates and DataArrayCoordinates? I'm not sure what the intended behaviour is if the user alters .coords directly.

1) It seems it is possible to alter a ds via ds.coords[new_coord_name] = new_coord:

```python In [34]: ds = xr.Dataset({'a': 0})

In [35]: ds.coords['c'] = 2

In [36]: ds Out[36]: <xarray.Dataset> Dimensions: () Coordinates: c int64 2 Data variables: a int64 0 ```

(That's a bit weird given that the docstring of DatasetCoordinates describes it as an "immutable dictionary" :confused:)

2) It also seems it is possible to similarly alter a da via da.coords[new_coord_name] = new_coord:

```python In [30]: da = xr.DataArray(0)

In [31]: da.coords['c'] = 1

In [32]: da Out[32]: <xarray.DataArray ()> array(0) Coordinates: c int64 1 ```

3) However is it meant to be possible to alter a ds via ds[var].coords[new_coord_name] = new_coord? Because that currently silently fails to update:

```python In [37]: ds = xr.Dataset({'a': 0})

In [38]: ds['a'].coords['c'] = 2

In [39]: ds Out[39]: <xarray.Dataset> Dimensions: () Data variables: a int64 0

In [40]: ds['a'] Out[40]: <xarray.DataArray 'a' ()> array(0)

In [41]: ds['a'].coords Out[41]: Coordinates: empty Bizarrely this does change though:python In [42]: coords = ds['a'].coords

In [43]: coords['c'] = 2

In [44]: coords Out[44]: Coordinates: c int64 2 ```

If altering .coords is intended behaviour though that means that my DataManifest also has to be accessible from DataArrayCoordinates too, so that the wrapping DataTree node can know about any changes to dt.ds[var].coords.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  [Experimental] Refactor Dataset to store variables in a manifest 1048697792
965544051 https://github.com/pydata/xarray/pull/5961#issuecomment-965544051 https://api.github.com/repos/pydata/xarray/issues/5961 IC_kwDOAMm_X845jQhz TomNicholas 35968931 2021-11-10T16:57:08Z 2021-11-10T17:06:13Z MEMBER

From a xarray.Dataset perspective, Dataset._variables just needs to be a MutableMapping of xarray.Variable objects. And in most cases (when not using DataTree), _variables would still be a plain dictionary, which means adding DataTree support would have no performance implications for normal Dataset objects.

That sounds nice, and might not require any changes to Dataset at all!

My tentative suggestion would be to use a mixed dictionary with either xarray.Variable or nested dictionaries as entries for the data in DataTree.

I think it's a lot easier to have a dict of DataTree objects rather than a nested dict of data, as then each node just points to its child nodes instead of having a node which knows about all the data in the whole tree (if that's what you meant).

How about making custom Mapping for use as Dataset._variables directly, which directly is a mapping of dataset variables?

So this is my understanding of what you're suggesting - I'm just not sure if it solves all the requirements:

```python class DataManifest(MutableMapping): """ Acts like a dict of keys to variables, but prevents setting variables to same key as any children """ def init(self, variables={}, children={}): # check for collisions here self._variables = {} self._children = {}

def __getitem__(self, key):
    # only expose the variables so this acts like a normal dict of variables
    return self._variables[key]

def __setitem__(self, key, var):
    if key in self._children:
        raise KeyError(
          "key already in use to denote a child" 
          "node in wrapping DataTree node"
        )
    self.__dict__[key] = var

class Dataset: self._variables = Mapping[Any, Variable] # in standard case just use dict of vars as before

# Use ._construct_direct as the constructor
# as it allows for setting ._variables directly

# therefore no changes to Dataset required!

class DataTree: def init(self, name, data, parent, children): self._children self._variables self._coord_names self._dims ...

@property
def ds(self):
    manifest = DataManifest(variables, children)
    return Dataset._from_treenode(
      variables=manifest,
      coord_names=self._coord_names,
      dims=self._dims,
      ...
    )

@ds.setter
def ds(self, ds):
    # check for collisions between ds.data_vars and self.children
    ...

ds = Dataset({'a': 0}) subtree1 = Datatree('group1') dt = Datatree('root', data=ds, children=[subtree])

wrapped_ds = dt.ds wrapped_ds['group1'] = 1 # raises KeyError - good!

subtree2 = Datatree('b') dt.ds['b'] = 2 # this will happily add a variable to the dataset dt.add_child(subtree2) # want to ensure this raises a KeyError as it conflicts with the new variable, but with this design I'm not sure if it will... ```

EDIT: Actually maybe this would work? So long as in DataTree we have python class DataTree: self._variables = manifest self._children = manifest.children Then adding a new child node would also update the manifest, meaning that the linked dataset should know about it too...

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  [Experimental] Refactor Dataset to store variables in a manifest 1048697792
964477183 https://github.com/pydata/xarray/pull/5961#issuecomment-964477183 https://api.github.com/repos/pydata/xarray/issues/5961 IC_kwDOAMm_X845fMD_ shoyer 1217238 2021-11-09T19:43:07Z 2021-11-09T19:43:07Z MEMBER

From a xarray.Dataset perspective, Dataset._variables just needs to be a MutableMapping of xarray.Variable objects. And in most cases (when not using DataTree), _variables would still be a plain dictionary, which means adding DataTree support would have no performance implications for normal Dataset objects.

My tentative suggestion would be to use a mixed dictionary with either xarray.Variable or nested dictionaries as entries for the data in DataTree. Then you could make a proxy mapping object that only exposes Variable objects for use in Dataset.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  [Experimental] Refactor Dataset to store variables in a manifest 1048697792
964450835 https://github.com/pydata/xarray/pull/5961#issuecomment-964450835 https://api.github.com/repos/pydata/xarray/issues/5961 IC_kwDOAMm_X845fFoT TomNicholas 35968931 2021-11-09T19:08:09Z 2021-11-09T19:24:32Z MEMBER

It just seems confusing to have an object referred to as ._variables which actually is just as much a container of child DataTree nodes as it is of variables...

Also the DataTree class needs to wrap this "manifest" too (otherwise I'm storing children in multiple places at once), and so I want it to make sense in that context too.

On Tue, 9 Nov 2021, 13:46 Stephan Hoyer, @.***> wrote:

How about making custom Mapping for use as Dataset._variables directly, which directly is a mapping of dataset variables? You could still be storing the underlying variables in a different way.

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/pydata/xarray/pull/5961#issuecomment-964433424, or unsubscribe https://github.com/notifications/unsubscribe-auth/AISNPI4LEDDTEGYBWJ7VK6LULFT75ANCNFSM5HVPT5AA .

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  [Experimental] Refactor Dataset to store variables in a manifest 1048697792
964433424 https://github.com/pydata/xarray/pull/5961#issuecomment-964433424 https://api.github.com/repos/pydata/xarray/issues/5961 IC_kwDOAMm_X845fBYQ shoyer 1217238 2021-11-09T18:46:11Z 2021-11-09T18:46:11Z MEMBER

How about making custom Mapping for use as Dataset._variables directly, which directly is a mapping of dataset variables? You could still be storing the underlying variables in a different way.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  [Experimental] Refactor Dataset to store variables in a manifest 1048697792

Advanced export

JSON shape: default, array, newline-delimited, object

CSV options:

CREATE TABLE [issue_comments] (
   [html_url] TEXT,
   [issue_url] TEXT,
   [id] INTEGER PRIMARY KEY,
   [node_id] TEXT,
   [user] INTEGER REFERENCES [users]([id]),
   [created_at] TEXT,
   [updated_at] TEXT,
   [author_association] TEXT,
   [body] TEXT,
   [reactions] TEXT,
   [performed_via_github_app] TEXT,
   [issue] INTEGER REFERENCES [issues]([id])
);
CREATE INDEX [idx_issue_comments_issue]
    ON [issue_comments] ([issue]);
CREATE INDEX [idx_issue_comments_user]
    ON [issue_comments] ([user]);
Powered by Datasette · Queries took 15.038ms · About: xarray-datasette