home / github / issue_comments

Menu
  • Search all tables
  • GraphQL API

issue_comments: 442956167

This data as json

html_url issue_url id node_id user created_at updated_at author_association body reactions performed_via_github_app issue
https://github.com/pydata/xarray/issues/1603#issuecomment-442956167 https://api.github.com/repos/pydata/xarray/issues/1603 442956167 MDEyOklzc3VlQ29tbWVudDQ0Mjk1NjE2Nw== 1217238 2018-11-29T19:10:14Z 2018-11-29T19:10:14Z MEMBER

Looking at the reported issues related to multi-indexes in xarray, I have the same feeling. Simply reusing pandas.MultiIndex in xarray where slightly different semantics are generally expected has shown to be painful. It seems easier to have our own baked solution and deal with differences during xarray<-> pandas conversion if needed.

I think the pandas.MultiIndex is a pretty solid data structure on a fundamental level, it just has some weird semantics for some indexing edge cases. Whether or not we write xarray.MultiIndex structure, we can achieve most of what we want with a thin layer over pandas.MultiIndex.

If a variable for each multi-coordinate index is "just" for data schema consistency, then why not showing all those indexes in a separate section of the repr?

Yes, I like this! Generally I like @benbovy's entire proposal :).

@fujiisoup can you clarity the use-cases you have for a MultiIndex as a variable?

Am I right in thinking the Multi-indexes is only a helpful note to users, rather than conveying anything about how data is accessed?

From a data perspective, the only thing having an Index and/or MultiIndex should change is that the data is immutable.

But by necessity the nature of the index will determine which indexing operations are possible/efficient. For example, if you want to do nearest-neighbor indexing with multiple coordinates you'll need a KDTree. We should not be afraid to raise errors if an indexing operation can't be done efficiently.


With regards to reindexing: I don't think this needs any special handling versus normal indexing (sel()). The rules basically fall out of those for normal indexing, except we handle missing values differently (by filling with NaN).

Another issue: how do automatic alignment with multiple indexes? Let me suggest a straw-man proposal: We always align indexed coordinates. If a coordinate is used in different types of indexes (e.g., a base Index in one argument and a MultiIndex level in another), we can either: 1. create a MultiIndex with the variable on the fly (this could be slightly expensive), or 2. fall back to only supporting "exact" indexing

{
    "total_count": 1,
    "+1": 1,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  262642978
Powered by Datasette · Queries took 0.848ms · About: xarray-datasette