home / github / issue_comments

Menu
  • GraphQL API
  • Search all tables

issue_comments: 305520522

This data as json

html_url issue_url id node_id user created_at updated_at author_association body reactions performed_via_github_app issue
https://github.com/pydata/xarray/pull/1426#issuecomment-305520522 https://api.github.com/repos/pydata/xarray/issues/1426 305520522 MDEyOklzc3VlQ29tbWVudDMwNTUyMDUyMg== 4160723 2017-06-01T15:00:06Z 2017-06-01T15:00:06Z MEMBER

@fujiisoup I agree that given your example proposal 2 might be more intuitive, however IMHO implicit indexes seem a bit too magical indeed. Although I don't have any concrete example in mind, I guess that sometimes I would be hard to really understand what's going on.

Exposing less concepts to users would be indeed an improvement, unless it makes things too implicit or magical.

Let me try to give a more detailed proposal than in my previous comment, which generalizes to potential features like multi-dimensional indexers (see @shoyer's comment, which I'd be happy to start working on soon).

It is actually very much like proposal 1, with only one additional concept (called "super index" below).

  • DataArray and Dataset objects may have coordinates, which are the variables listed in da.coords or ds.coords. These variables may be 1-dimensional or n-dimensional.

  • Among these coordinates, some are "indexed" coordinates. These are marked by * in the repr and can be used in .sel and .isel as keyword arguments.

  • Some coordinates may be grouped together and wrapped by some kinds of "super indexes". These super indexes are also marked by * in the repr and the coordinates that are part of it are shown next below with the - marker. Each coordinate wrapped by a super index is considered as an indexed coordinate: it is still listed in da.coords or ds.coords and it can be also used in .sel and .isel as keyword argument. This is different for the super index, which is not listed in .coords. If needed, we might make super indexes accessible as virtual coordinates: they would then return arrays of tuples with the values of the wrapped coordinates.

Examples of super indexes:

  • KDTree. It allows multi-dimensional coordinates to be indexed using a KDTree.
  • Similarly, BallTree or RTree...
  • MultiIndex (or CoordinateGroup or any better name). It allows to explicitly define multiple indexes for a given dimension and to explicitly define the behavior when for example we select data with conflicting labels in different coordinates. It also naturally converts to a pandas.MultiIndex when we want to convert to a DataFrame.

"Super index" is an additional concept that has to be understood by users, which is in principle bad, but here I think it's worth as it potentially gives a good generic model for explicit handling of various, advanced indexes that involve multiple coordinates.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  231308952
Powered by Datasette · Queries took 0.827ms · About: xarray-datasette