home / github

Menu
  • Search all tables
  • GraphQL API

issue_comments

Table actions
  • GraphQL API for issue_comments

3 rows where author_association = "MEMBER", issue = 231308952 and user = 4160723 sorted by updated_at descending

✎ View and edit SQL

This data as json, CSV (advanced)

Suggested facets: created_at (date), updated_at (date)

user 1

  • benbovy · 3 ✖

issue 1

  • scalar_level in MultiIndex · 3 ✖

author_association 1

  • MEMBER · 3 ✖
id html_url issue_url node_id user created_at updated_at ▲ author_association body reactions performed_via_github_app issue
310032997 https://github.com/pydata/xarray/pull/1426#issuecomment-310032997 https://api.github.com/repos/pydata/xarray/issues/1426 MDEyOklzc3VlQ29tbWVudDMxMDAzMjk5Nw== benbovy 4160723 2017-06-21T10:08:07Z 2017-06-21T10:58:29Z MEMBER

Although I haven't thought about all the details regarding this, I think that in the case of multi-dimensional coordinates a "super index" would rather allow directly using these coordinates for indexing, which is currently not possible.

In your 'rasm' example, it would rather look like

python <xarray.Dataset> Dimensions: (time: 36, x: 275, y: 205) Dimensions without coordinates: y, x Coordinates: * time (time) float64 7.226e+05 7.226e+05 7.227e+05 7.227e+05 ... * spatial_index (y, x) KDTree - xc (y, x) float64 189.2 189.4 189.6 189.7 189.9 190.1 190.2 190.4 ... - yc (y, x) float64 16.53 16.78 17.02 17.27 17.51 17.76 18.0 18.25 ... Dimensions without coordinates: x, y Data variables: Tair (time, y, x) float64 nan nan nan nan nan nan nan nan nan nan ... Attributes: ...

and it would allow writing

python In [1]: ds.sel(xc=<...>, yc=<...>, method='nearest')

Note that x and y dimensions still don't have coordinates.

That's actually what @shoyer suggested here.

The proposal above is more about having the same API for groups of coordinates that can be indexed using a "wrapped" index object (maybe "wrapped index" is a better name than "super index"?), but the logic can be very different from one index object to another.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  scalar_level in MultiIndex 231308952
305520522 https://github.com/pydata/xarray/pull/1426#issuecomment-305520522 https://api.github.com/repos/pydata/xarray/issues/1426 MDEyOklzc3VlQ29tbWVudDMwNTUyMDUyMg== benbovy 4160723 2017-06-01T15:00:06Z 2017-06-01T15:00:06Z MEMBER

@fujiisoup I agree that given your example proposal 2 might be more intuitive, however IMHO implicit indexes seem a bit too magical indeed. Although I don't have any concrete example in mind, I guess that sometimes I would be hard to really understand what's going on.

Exposing less concepts to users would be indeed an improvement, unless it makes things too implicit or magical.

Let me try to give a more detailed proposal than in my previous comment, which generalizes to potential features like multi-dimensional indexers (see @shoyer's comment, which I'd be happy to start working on soon).

It is actually very much like proposal 1, with only one additional concept (called "super index" below).

  • DataArray and Dataset objects may have coordinates, which are the variables listed in da.coords or ds.coords. These variables may be 1-dimensional or n-dimensional.

  • Among these coordinates, some are "indexed" coordinates. These are marked by * in the repr and can be used in .sel and .isel as keyword arguments.

  • Some coordinates may be grouped together and wrapped by some kinds of "super indexes". These super indexes are also marked by * in the repr and the coordinates that are part of it are shown next below with the - marker. Each coordinate wrapped by a super index is considered as an indexed coordinate: it is still listed in da.coords or ds.coords and it can be also used in .sel and .isel as keyword argument. This is different for the super index, which is not listed in .coords. If needed, we might make super indexes accessible as virtual coordinates: they would then return arrays of tuples with the values of the wrapped coordinates.

Examples of super indexes:

  • KDTree. It allows multi-dimensional coordinates to be indexed using a KDTree.
  • Similarly, BallTree or RTree...
  • MultiIndex (or CoordinateGroup or any better name). It allows to explicitly define multiple indexes for a given dimension and to explicitly define the behavior when for example we select data with conflicting labels in different coordinates. It also naturally converts to a pandas.MultiIndex when we want to convert to a DataFrame.

"Super index" is an additional concept that has to be understood by users, which is in principle bad, but here I think it's worth as it potentially gives a good generic model for explicit handling of various, advanced indexes that involve multiple coordinates.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  scalar_level in MultiIndex 231308952
305039117 https://github.com/pydata/xarray/pull/1426#issuecomment-305039117 https://api.github.com/repos/pydata/xarray/issues/1426 MDEyOklzc3VlQ29tbWVudDMwNTAzOTExNw== benbovy 4160723 2017-05-30T23:38:05Z 2017-05-30T23:38:05Z MEMBER

I also fully agree that using multiple coordinate (index) variables instead of a MultiIndex would greatly simplify things both internally and for users!

A dimension with a single 'real' coordinate (i.e., an IndexVariable) that warps a MultiIndex with multiple 'levels' that can be accessed (and indexed) as 'virtual' coordinates indeed represents a lot of unnecessary complexity!! A dimension having multiple 'real' coordinates that can be used with .sel - or even .isel - is much simpler to understand and maybe to implement.

Using multiple 'real' coordinates, I don't see any reason why ds.sel(x='a'), ds.isel(x=[0, 1]) or ds.sel(x='a', y=[1, 2]) would not be supported. However, we need to choose what to do in case of conflicts, e.g., ds.isel(x=[0, 1], y=[1, 2]). Raise an error? Return a result equivalent to ds.isel(yx=1)(and) or equivalent to ds.isel(x=[0, 1, 2]) (or)?

The important practical difference is that here there are no labels along the yx, so ds['yx'][0] would not return a tuple. Also, we would need to figure out some way to explicitly signal what should become part of a MultiIndex when we convert to a pandas DataFrame.

I'm thinking about something like this:

<xarray.Dataset> Dimensions: (yx: 6) Coordinates: * yx (yx) CoordinateGroup - y (yx) object 'a' 'a' 'a' 'b' 'b' 'b' - x (yx) int64 1 2 3 1 2 3 Data variables: foo (yx) int64 1 2 3 4 5 6

It may present several advantages:

  • Instead of being listed as a dimension without coordinates (which is not true), yx would have a CoordinateGroup that would simply consist of a lightweight object that only contains references to the x and y coordinates.

  • CoordinateGroup may behave like a virtual coordinate so that ds['yx'][0] still returns a tuple (there may not be many use cases for this, though).

  • set_index, reset_index and reorder_levels can still be used to explicitly create, modify or remove a CoordinateGroup for a given dimension.

  • It is trivial to convert a CoordinateGroup to a MultiIndex when we convert to a pandas DataFrame. According to @fmaussion's comment above, I think that using here a name like CoordinateGroup is much easier to understand for xarray users that using the name MultiIndex.

  • In repr(), x and y are still shown next to each other.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  scalar_level in MultiIndex 231308952

Advanced export

JSON shape: default, array, newline-delimited, object

CSV options:

CREATE TABLE [issue_comments] (
   [html_url] TEXT,
   [issue_url] TEXT,
   [id] INTEGER PRIMARY KEY,
   [node_id] TEXT,
   [user] INTEGER REFERENCES [users]([id]),
   [created_at] TEXT,
   [updated_at] TEXT,
   [author_association] TEXT,
   [body] TEXT,
   [reactions] TEXT,
   [performed_via_github_app] TEXT,
   [issue] INTEGER REFERENCES [issues]([id])
);
CREATE INDEX [idx_issue_comments_issue]
    ON [issue_comments] ([issue]);
CREATE INDEX [idx_issue_comments_user]
    ON [issue_comments] ([user]);
Powered by Datasette · Queries took 34.723ms · About: xarray-datasette