home / github / issue_comments

Menu
  • Search all tables
  • GraphQL API

issue_comments: 304778433

This data as json

html_url issue_url id node_id user created_at updated_at author_association body reactions performed_via_github_app issue
https://github.com/pydata/xarray/pull/1426#issuecomment-304778433 https://api.github.com/repos/pydata/xarray/issues/1426 304778433 MDEyOklzc3VlQ29tbWVudDMwNDc3ODQzMw== 1217238 2017-05-30T05:29:11Z 2017-05-30T05:29:11Z MEMBER

Sorry for the delay getting back to you here -- I'm still thinking through the implications of this change.

This does make the handling of MultiIndex type data much more consistent, but calling scalars MultiIndex-scalar seems quite confusing to me. I think of the data-type here as closer to NumPy's structured types, except without the implied storage format for the data.

However, taking a step back, I wonder if this is the right approach. In many ways, structured dtypes are similar to xarray's existing data structures, so supporting them fully means a lot of duplicated functionality. MultiIndexes (especially with scalars) should work similarly to separate variables, but they are implemented very differently under the hood (all the data lives in one variable).

(See https://github.com/pandas-dev/pandas/issues/3443 for related discussion about pandas and why it doesn't support structured dtypes.)

It occurs to me that if we had full support for indexing on coordinate levels, we might not need a notion of a "MultiIndex" in the public API at all. To make this more concrete, what if this was the repr() for the result of ds.stack(yx=['y', 'x']) in your first example? <xarray.Dataset> Dimensions: (yx: 6) Coordinates: y (yx) object 'a' 'a' 'a' 'b' 'b' 'b' x (yx) int64 1 2 3 1 2 3 Data variables: foo (yx) int64 1 2 3 4 5 6 If we supported MultiIndex-like indexing for x and y, this could be nearly equivalent to a MultiIndex with much less code duplication. The important practical difference is that here there are no labels along the yx, so ds['yx'][0] would not return a tuple. Also, we would need to figure out some way to explicitly signal what should become part of a MultiIndex when we convert to a pandas DataFrame.

Pandas has MultiIndex because it needed a way to group multiple arrays together into a single index array. In xarray, this is less necessary, because we have multiple coordinates to represent levels, and xarray itself no longer need a MultiIndex notion because we longer requires coordinate labels for every dimension (as of v0.9).

CC @benbovy

{
    "total_count": 2,
    "+1": 2,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  231308952
Powered by Datasette · Queries took 160.745ms · About: xarray-datasette