home / github / issue_comments

Menu
  • Search all tables
  • GraphQL API

issue_comments: 937865982

This data as json

html_url issue_url id node_id user created_at updated_at author_association body reactions performed_via_github_app issue
https://github.com/pydata/xarray/issues/5647#issuecomment-937865982 https://api.github.com/repos/pydata/xarray/issues/5647 937865982 IC_kwDOAMm_X8435rL- 4160723 2021-10-07T14:48:02Z 2021-10-07T14:48:02Z MEMBER

@shoyer I'm looking more deeply into this. I think it will be impossible to avoid a heavy refactoring in core/alignment.py if we want to support "meta-indexes" and handle non-dimension indexed coordinates, so I'd like to check with you about how best we can tackle this problem.

I'm thinking about the following approach:

  1. Instead of looking at the dimensions of each object to align, look at their indexes.

    • matching indexes must all correspond to the same set of coordinate names and dimension names (should we also check the index type?)
    • if coordinate names vs. indexes match only partially, then raise an error
  2. For each of the matched indexes, check for index equality and perform join if that's not the case (or just pick the matching index that has been explicitly passed to align via its indexes argument)

    • For convenience, we could expose an Index.join(self, other, how) method instead of having two separate Index.union() and Index.intersection() methods. Index.join() might also return index coordinate variables that we can assign to the aligned objects
    • if the indexes do not provide an implementation for .equals, check for coordinate equality
    • if they do not provide an implementation for .join, raise an error or ignore it? Probably better to ignore it, i.e., do not create a joined index in the aligned objects and align the coordinates just as regular variables (this might be taken care by another index)...
  3. Add an Index.dim_indexers(self) -> Dict[Hashable, Any] property that returns label indexers for each dimension involved in the index, and which will be used to conform the objects to the joined indexes.

    • this is probably where we could make sure to not cast str coords to object for pandas indexes
  4. Merge dim_indexers returned by all joined indexes and raise if there's any conflict (i.e., two distinct indexes should not return indexers along a common dimension).

  5. Check the size of unlabelled dimensions (also against the sizes found in the merged dim_indexers for matching dimensions)

  6. Reindex the objects

Does that sounds right to you? I'd really appreciate any guidance on this before going further as I'm worried about missing something important or another more straightforward approach.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  955936490
Powered by Datasette · Queries took 0.802ms · About: xarray-datasette