home / github / issue_comments

Menu
  • Search all tables
  • GraphQL API

issue_comments: 305039117

This data as json

html_url issue_url id node_id user created_at updated_at author_association body reactions performed_via_github_app issue
https://github.com/pydata/xarray/pull/1426#issuecomment-305039117 https://api.github.com/repos/pydata/xarray/issues/1426 305039117 MDEyOklzc3VlQ29tbWVudDMwNTAzOTExNw== 4160723 2017-05-30T23:38:05Z 2017-05-30T23:38:05Z MEMBER

I also fully agree that using multiple coordinate (index) variables instead of a MultiIndex would greatly simplify things both internally and for users!

A dimension with a single 'real' coordinate (i.e., an IndexVariable) that warps a MultiIndex with multiple 'levels' that can be accessed (and indexed) as 'virtual' coordinates indeed represents a lot of unnecessary complexity!! A dimension having multiple 'real' coordinates that can be used with .sel - or even .isel - is much simpler to understand and maybe to implement.

Using multiple 'real' coordinates, I don't see any reason why ds.sel(x='a'), ds.isel(x=[0, 1]) or ds.sel(x='a', y=[1, 2]) would not be supported. However, we need to choose what to do in case of conflicts, e.g., ds.isel(x=[0, 1], y=[1, 2]). Raise an error? Return a result equivalent to ds.isel(yx=1)(and) or equivalent to ds.isel(x=[0, 1, 2]) (or)?

The important practical difference is that here there are no labels along the yx, so ds['yx'][0] would not return a tuple. Also, we would need to figure out some way to explicitly signal what should become part of a MultiIndex when we convert to a pandas DataFrame.

I'm thinking about something like this:

<xarray.Dataset> Dimensions: (yx: 6) Coordinates: * yx (yx) CoordinateGroup - y (yx) object 'a' 'a' 'a' 'b' 'b' 'b' - x (yx) int64 1 2 3 1 2 3 Data variables: foo (yx) int64 1 2 3 4 5 6

It may present several advantages:

  • Instead of being listed as a dimension without coordinates (which is not true), yx would have a CoordinateGroup that would simply consist of a lightweight object that only contains references to the x and y coordinates.

  • CoordinateGroup may behave like a virtual coordinate so that ds['yx'][0] still returns a tuple (there may not be many use cases for this, though).

  • set_index, reset_index and reorder_levels can still be used to explicitly create, modify or remove a CoordinateGroup for a given dimension.

  • It is trivial to convert a CoordinateGroup to a MultiIndex when we convert to a pandas DataFrame. According to @fmaussion's comment above, I think that using here a name like CoordinateGroup is much easier to understand for xarray users that using the name MultiIndex.

  • In repr(), x and y are still shown next to each other.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  231308952
Powered by Datasette · Queries took 0.639ms · About: xarray-datasette