html_url,issue_url,id,node_id,user,created_at,updated_at,author_association,body,reactions,performed_via_github_app,issue https://github.com/pydata/xarray/issues/5647#issuecomment-1239162673,https://api.github.com/repos/pydata/xarray/issues/5647,1239162673,IC_kwDOAMm_X85J3B8x,4160723,2022-09-07T09:47:13Z,2022-09-07T09:47:13Z,MEMBER,"I think we can close this issue, which has mostly been addressed in #5692. I opened #7002 for the potential problem of index alignment vs. coordinate ordering.","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,955936490 https://github.com/pydata/xarray/issues/5647#issuecomment-949618104,https://api.github.com/repos/pydata/xarray/issues/5647,949618104,IC_kwDOAMm_X844mgW4,4160723,2021-10-22T13:11:01Z,2021-10-22T13:11:01Z,MEMBER,"@shoyer I'm now reviewing the merging logic. I works mostly fine, with some minor concerns: - `MergeElement` (variable, index) tuples are not grouped per matching (unique) index, so index equality will be checked more than once for multi-coordinate indexes. Not a big deal I think, unless for large multi-indexes with many levels. A proper fix for this would probably require some heavier refactoring, though. Is it ok to leave this as is for now? - I'm not sure how we should handle the case where `foo` is the name of both an un-indexed dimension and a coordinate with other dimension(s). Let's take this example: ```python >>> from xarray.tests.test_dataset import create_test_multiindex >>> data = create_test_multiindex() >>> data Dimensions: (x: 4) Coordinates: * x (x) MultiIndex - level_1 (x) object 'a' 'a' 'b' 'b' - level_2 (x) int64 1 2 1 2 Data variables: *empty* ``` ```python >>> other = xr.Dataset({""z"": (""level_1"", [0, 1])}) >>> merged = data.merge(other) ValueError: conflicting level / dimension names. level_1 already exists as a level name. ``` Do we raise a `ValueError` here because otherwise we don't know what to return from `merged[""level_1""]` (i.e., the multi-index virtual coordinate or a default dimension coordinate)? Can we allow this case now that multi-index level coordinates are not virtual anymore? Note: the following example does not raise any error in xarray: ```python >>> data = xr.Dataset(coords={""x"": [0, 1, 2, 3], ""level_1"": (""x"", [""a"", ""a"", ""b"", ""b""])}) >>> other = xr.Dataset({""z"": (""level_1"", [0, 1])}) >>> merged = data.merge(other) >>> merged Dimensions: (x: 4, level_1: 2) Coordinates: * x (x) int64 0 1 2 3 level_1 (x) If we like, we could probably even add static type information on reindex_like, to require the same index type Yes, I did that for `Index.join()` and `Index.reindex_like()` in #5692. ","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,955936490 https://github.com/pydata/xarray/issues/5647#issuecomment-945592260,https://api.github.com/repos/pydata/xarray/issues/5647,945592260,IC_kwDOAMm_X844XJfE,4160723,2021-10-18T09:43:18Z,2021-10-18T10:47:26Z,MEMBER,"Ok, I'm now hitting another obstacle while working on reindex. So one general approach for both alignment and re-indexing that is ""pretty straightforward"" to implement with the new Xarray index data model is: (1) find matching indexes based on their corresponding coordinate/dimension names and index type, and (2) call `idx.join(other)` and/or `idx.reindex_like(other)` where `other` is another index object of the same type than `idx`. This is what I've done so far in #5692. Relaxing any of the constraints in (1) would be much more complicated to implement. We would need to do some sort of mapping from dimension labels to all involved (multi-/meta-)indexes, then check for conflicts in dimension indexers returned from multiple indexes, possibly handle/remove multi-index coordinates (or convert back to non-indexed coordinates), etc. One problem with `Dataset.reindex` and `DataArray.reindex` is that we can pass any `{dim: labels}` as indexers. How should we adapt the API for flexible indexes? How to ensure backwards compatibility? The current possible cases are: 1. Both `dim` and `labels` correspond (or may be cast) to single pandas indexes: there's no issue in this case with the constraints stated in (1) above. 2. `dim` has a multi-index, `labels` is a `pd.MultiIndex` and level names exactly match: no real issue either in this case, but shouldn't we discourage this implicit behavior in the mid/long term and instead encourage using `reindex_like` for such more advanced use case? 3. `dim` has a multi-index and `labels` is (cast to) a single pandas index (or the other way around): this is currently possible in Xarray but it seems silly? After re-indexing, all data along `dim` is filled with `fill_value`... Would it be fine to instead raise an error *now*? Would it really break any user case? 4. `dim` has a multi-index, `labels` is a `pd.MultiIndex` and multi-index level names don't match: same problem than for case 3. Cases 3 and 4 are a big obstacle for me right now. I really don't know how we can still support those special cases without deeply re-thinking the problem. If they could be considered as a bug, then the new implementation would already raise an nice error message :-).","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,955936490 https://github.com/pydata/xarray/issues/5647#issuecomment-944273383,https://api.github.com/repos/pydata/xarray/issues/5647,944273383,IC_kwDOAMm_X844SHfn,4160723,2021-10-15T12:53:45Z,2021-10-15T12:57:38Z,MEMBER,"I've now re-implemented most of the alignment logic in #5692, using a dedicated class so that all intermediate steps are implemented in a clearer and more maintainable way. One remaining task is to get the re-indexing logic right. Rather than relying on an `Index.dim_indexers` property, I feel that it'd be better to have two methods: ```python DimIntIndexers = Dict[Hashable, Any] class Index: def reindex(self, dim_labels: Mapping[Hashable, Any]) -> DimIntIndexers: ... def reindex_like(self, other: ""Index"") -> DimIntIndexers: ... ``` For alignment, I think we could directly call `idx.reindex_like(other)` for each index in each object to align where `other` corresponds to the aligned index that matches `idx`.","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,955936490 https://github.com/pydata/xarray/issues/5647#issuecomment-937865982,https://api.github.com/repos/pydata/xarray/issues/5647,937865982,IC_kwDOAMm_X8435rL-,4160723,2021-10-07T14:48:02Z,2021-10-07T14:48:02Z,MEMBER,"@shoyer I'm looking more deeply into this. I think it will be impossible to avoid a heavy refactoring in `core/alignment.py` if we want to support ""meta-indexes"" and handle non-dimension indexed coordinates, so I'd like to check with you about how best we can tackle this problem. I'm thinking about the following approach: 1. Instead of looking at the dimensions of each object to align, look at their indexes. - matching indexes must all correspond to the same set of coordinate names *and* dimension names (should we also check the index type?) - if coordinate names vs. indexes match only partially, then raise an error 2. For each of the matched indexes, check for index equality and perform join if that's not the case (or just pick the matching index that has been explicitly passed to `align` via its `indexes` argument) - For convenience, we could expose an `Index.join(self, other, how)` method instead of having two separate `Index.union()` and `Index.intersection()` methods. `Index.join()` might also return index coordinate variables that we can assign to the aligned objects - if the indexes do not provide an implementation for `.equals`, check for coordinate equality - if they do not provide an implementation for `.join`, raise an error or ignore it? Probably better to ignore it, i.e., do not create a joined index in the aligned objects and align the coordinates just as regular variables (this might be taken care by another index)... 3. Add an `Index.dim_indexers(self) -> Dict[Hashable, Any]` property that returns label indexers for each dimension involved in the index, and which will be used to conform the objects to the joined indexes. - this is probably where we could [make sure to not cast str coords to object for pandas indexes](https://github.com/pydata/xarray/blob/5499949b5937277dcd599a182201d0e2fc5e818e/xarray/core/alignment.py#L327) 4. Merge `dim_indexers` returned by all joined indexes and raise if there's any conflict (i.e., two distinct indexes should not return indexers along a common dimension). 5. Check the size of unlabelled dimensions (also against the sizes found in the merged `dim_indexers` for matching dimensions) 6. Reindex the objects Does that sounds right to you? I'd really appreciate any guidance on this before going further as I'm worried about missing something important or another more straightforward approach. ","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,955936490 https://github.com/pydata/xarray/issues/5647#issuecomment-889483293,https://api.github.com/repos/pydata/xarray/issues/5647,889483293,IC_kwDOAMm_X841BHAd,4160723,2021-07-29T21:49:02Z,2021-07-29T21:49:02Z,MEMBER,"> We should raise an error in this cases (and/or suggest setting a MultiIndex). It should also be OK if not every index implements alignment, in which case they should raise an error if coordinates do not match exactly. Agreed. > I don't think we should try to support alignment with multi-dimensional (non-orthogonal) inside align(). Instead we can just require the coordinates corresponding to such indexes to match exactly (or within a tolerance). I could see some examples (e.g., union or intersection of staggered grids) where it could still be useful to have a (meta-)index that implements alignment, though. Actually, while working on #5636 I was thinking more about how to perform alignment based on `xarray.Index` objects given that the concept of an `xarray.Index` should be clearly distinct from the concept of a `Variable`. We could maybe add an `xarray.Index.sizes` property similarly to `Variable.sizes`. Alternatively, we could allow `xarray.Index.union()` and `xarray.Index.intersection()` returning both the joined index and the new `Variable` objects created from the latter index. I'd lean towards the 2nd option, which is consistent with the class method constructors added in #5636. Not sure the 1st option is a good idea and it doesn't solve all the issues mentioned in my comment above. ","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,955936490