home / github

Menu
  • GraphQL API
  • Search all tables

issue_comments

Table actions
  • GraphQL API for issue_comments

2 rows where author_association = "MEMBER", issue = 231308952 and user = 1217238 sorted by updated_at descending

✎ View and edit SQL

This data as json, CSV (advanced)

Suggested facets: created_at (date), updated_at (date)

user 1

  • shoyer · 2 ✖

issue 1

  • scalar_level in MultiIndex · 2 ✖

author_association 1

  • MEMBER · 2 ✖
id html_url issue_url node_id user created_at updated_at ▲ author_association body reactions performed_via_github_app issue
305058643 https://github.com/pydata/xarray/pull/1426#issuecomment-305058643 https://api.github.com/repos/pydata/xarray/issues/1426 MDEyOklzc3VlQ29tbWVudDMwNTA1ODY0Mw== shoyer 1217238 2017-05-31T01:46:35Z 2017-05-31T01:46:35Z MEMBER

If my understanding is correct, does it mean that we will support ds.sel(x='a'), ds.isel(x=[0, 1]) and ds.mean(dim='x') with your example data? Will it raise an Error if Coordinate is more than 1 dimensional? How about ds.sel(x='a', y=[1, 2])?

I was only thinking about .sel() (as works currently with MultiIndex). I'm not sure about the others yet.

@benbovy although a CoordinateGroup is definitely better than MultiIndex-scalar, it still feels like a very similar notion. It could make for a nice internal clean-up, but from an user perspective I think it's about as confusing as a MultiIndex -- it's just as many terms to keep track of.

Right now, our user facing API in xarray exposes three related concepts: - Coordinate - Index - MultiIndex

Eliminating any of these concepts would be an improvement.

To this end, I have two (vague) proposals: 1. Eliminate MultiIndex. We only have an idea of "indexed" coordinates, marked by * in the repr, which don't necessarily correspond to dimensions. Indexed coordinates, which are immutable, can have any number of dimensions and you can have any other of "indexed" coordinates per dimension. Indexing, concatenating and expanding dimensions should not change their nature. 2. Eliminate both MultiIndex and explicit indexes. Indexes required for efficient operations are created on the fly when necessary. This might be too magical.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  scalar_level in MultiIndex 231308952
304778433 https://github.com/pydata/xarray/pull/1426#issuecomment-304778433 https://api.github.com/repos/pydata/xarray/issues/1426 MDEyOklzc3VlQ29tbWVudDMwNDc3ODQzMw== shoyer 1217238 2017-05-30T05:29:11Z 2017-05-30T05:29:11Z MEMBER

Sorry for the delay getting back to you here -- I'm still thinking through the implications of this change.

This does make the handling of MultiIndex type data much more consistent, but calling scalars MultiIndex-scalar seems quite confusing to me. I think of the data-type here as closer to NumPy's structured types, except without the implied storage format for the data.

However, taking a step back, I wonder if this is the right approach. In many ways, structured dtypes are similar to xarray's existing data structures, so supporting them fully means a lot of duplicated functionality. MultiIndexes (especially with scalars) should work similarly to separate variables, but they are implemented very differently under the hood (all the data lives in one variable).

(See https://github.com/pandas-dev/pandas/issues/3443 for related discussion about pandas and why it doesn't support structured dtypes.)

It occurs to me that if we had full support for indexing on coordinate levels, we might not need a notion of a "MultiIndex" in the public API at all. To make this more concrete, what if this was the repr() for the result of ds.stack(yx=['y', 'x']) in your first example? <xarray.Dataset> Dimensions: (yx: 6) Coordinates: y (yx) object 'a' 'a' 'a' 'b' 'b' 'b' x (yx) int64 1 2 3 1 2 3 Data variables: foo (yx) int64 1 2 3 4 5 6 If we supported MultiIndex-like indexing for x and y, this could be nearly equivalent to a MultiIndex with much less code duplication. The important practical difference is that here there are no labels along the yx, so ds['yx'][0] would not return a tuple. Also, we would need to figure out some way to explicitly signal what should become part of a MultiIndex when we convert to a pandas DataFrame.

Pandas has MultiIndex because it needed a way to group multiple arrays together into a single index array. In xarray, this is less necessary, because we have multiple coordinates to represent levels, and xarray itself no longer need a MultiIndex notion because we longer requires coordinate labels for every dimension (as of v0.9).

CC @benbovy

{
    "total_count": 2,
    "+1": 2,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  scalar_level in MultiIndex 231308952

Advanced export

JSON shape: default, array, newline-delimited, object

CSV options:

CREATE TABLE [issue_comments] (
   [html_url] TEXT,
   [issue_url] TEXT,
   [id] INTEGER PRIMARY KEY,
   [node_id] TEXT,
   [user] INTEGER REFERENCES [users]([id]),
   [created_at] TEXT,
   [updated_at] TEXT,
   [author_association] TEXT,
   [body] TEXT,
   [reactions] TEXT,
   [performed_via_github_app] TEXT,
   [issue] INTEGER REFERENCES [issues]([id])
);
CREATE INDEX [idx_issue_comments_issue]
    ON [issue_comments] ([issue]);
CREATE INDEX [idx_issue_comments_user]
    ON [issue_comments] ([user]);
Powered by Datasette · Queries took 277.097ms · About: xarray-datasette