home / github / issue_comments

Menu
  • Search all tables
  • GraphQL API

issue_comments: 442581754

This data as json

html_url issue_url id node_id user created_at updated_at author_association body reactions performed_via_github_app issue
https://github.com/pydata/xarray/issues/1603#issuecomment-442581754 https://api.github.com/repos/pydata/xarray/issues/1603 442581754 MDEyOklzc3VlQ29tbWVudDQ0MjU4MTc1NA== 1217238 2018-11-28T19:51:42Z 2018-11-29T00:48:53Z MEMBER

I've been thinking about this a little more in the context of starting on the implementation (in #2195).

In particular, I no longer agree with this "Separate indexers without a MultiIndex should be prohibited" from my original proposal. The problem is that the semantics of a MultiIndex are not quite the same as separate indexes, and I don't think all use-cases are well solved by always using a MultiIndex. ~~For example, I don't think it's possible to do point-wise indexing along anything other than the first level of a MultiIndex.~~ (note: this is not true, see https://github.com/pydata/xarray/issues/1603#issuecomment-442662561)

Instead, I think we should make the model transparent by retaining an xarray variable for the MultiIndex, and provide APIs for explicitly converting index types.

e.g., for the repr with a MultiIndex: Coordinates: * x (x) MultiIndex[level_1, level_2] * level_1 (x) object 'a' 'a' 'b' 'b' * level_2 (x) int64 1 2 1 2 and without a MultiIndex: Coordinates: * level_1 (x) object 'a' 'a' 'b' 'b' * level_2 (x) int64 1 2 1 2

The main way in which this could get confusing is if you explicitly mutate the Dataset to remove some but not all of the variables corresponding to the MultiIndex (e.g., x but not level_1 or vise-versa). We have a few potential options here: 1. Don't worry about it: if you mutate objects, you can potentially end up in slightly confusing internal states. If you care about whether level_1 uses a pandas.Index or pandas.MultiIndex, you can find out for sure by checking ds.indexes['level_1']. 2. Prohibit it in our data model: either (a) raise an error if you try to manually delete a single variable or (b) automatically delete all associated variables, too. Encourage using various explicit APIs that return new objects with a new index. 3. Use a different indicator than * for marking "indirect" indexes, so it's more obvious if some coordinates get removed, e.g., Coordinates: * x (x) MultiIndex[level_1, level_2] + level_1 (x) object 'a' 'a' 'b' 'b' + level_2 (x) int64 1 2 1 2

The different indicator might make sense regardless but I am also partial to "Prohibit it in our data model." The main downside is that this adds a little more complexity to the logic for determining indexes resulting from an operation (namely, verifying that all MultiIndex levels still correspond to coordinates).

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  262642978
Powered by Datasette · Queries took 0.733ms · About: xarray-datasette