html_url,issue_url,id,node_id,user,created_at,updated_at,author_association,body,reactions,performed_via_github_app,issue
https://github.com/pydata/xarray/issues/4824#issuecomment-841244252,https://api.github.com/repos/pydata/xarray/issues/4824,841244252,MDEyOklzc3VlQ29tbWVudDg0MTI0NDI1Mg==,10194086,2021-05-14T13:28:50Z,2021-05-14T13:28:50Z,MEMBER,"I add the example from https://github.com/pydata/xarray/pull/4753#discussion_r631399543 here. I am not 100% if that is the same problem or not:
```python
def combine_by_coords_problem(name, join=""override""):
ds0 = xr.Dataset(coords={""x1"": [1, 2, 3], name: [10, 20, 30]})
ds1 = xr.Dataset(coords={""x1"": [4, 5, 6], name: [40, 50, 60]})
return xr.combine_by_coords([ds0, ds1], join=join)
combine_by_coords_problem(""x0"") # concatenates 1, 2, 3, 4, 5, 6
combine_by_coords_problem(""x2"") # concatenates 10, 20, 30, 40, 50, 60
```
with `join=`
- `""exact""`: error
- `""left"", ""right"", ""inner"", ""override""`: ambiguous result
- `""outer""`: sensible result","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,788534915
https://github.com/pydata/xarray/issues/4824#issuecomment-840687993,https://api.github.com/repos/pydata/xarray/issues/4824,840687993,MDEyOklzc3VlQ29tbWVudDg0MDY4Nzk5Mw==,2448579,2021-05-13T16:47:56Z,2021-05-13T16:47:56Z,MEMBER,"> Does that mean we should just not to allow join='override' for combine_by_coords?
I think so.
I suspect the original problem can only be solved by allowing approximate alignment within a specified tolerance.","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,788534915
https://github.com/pydata/xarray/issues/4824#issuecomment-840680285,https://api.github.com/repos/pydata/xarray/issues/4824,840680285,MDEyOklzc3VlQ29tbWVudDg0MDY4MDI4NQ==,35968931,2021-05-13T16:35:16Z,2021-05-13T16:35:16Z,MEMBER,"Thanks @dcherian that helps. It does seem silly to be like ""I'm going to use the coordinates to decide how everything should be concatenated, but I'm also happy to change some of those coordinate values whilst I'm combining"". Does that mean we should just not to allow `join='override'` for `combine_by_coords`? I don't think that solves @mathause 's original problem though.","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,788534915
https://github.com/pydata/xarray/issues/4824#issuecomment-840675463,https://api.github.com/repos/pydata/xarray/issues/4824,840675463,MDEyOklzc3VlQ29tbWVudDg0MDY3NTQ2Mw==,2448579,2021-05-13T16:27:15Z,2021-05-13T16:27:38Z,MEMBER,"For `concat`, `join` applies to all indexes that are not being concatenated (e.g. concat over time but `override` floating point differences in latitude). This is easy to understand for `combine_nested`.
I don't think it makes sense at all for `combine_by_coords`. If you want to `override` coordinate labels values for some dimensions you should have to do it manually. It seems easy to land up in weird edge-cases when you combine that with autoguessing behaviour.","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,788534915
https://github.com/pydata/xarray/issues/4824#issuecomment-840632567,https://api.github.com/repos/pydata/xarray/issues/4824,840632567,MDEyOklzc3VlQ29tbWVudDg0MDYzMjU2Nw==,35968931,2021-05-13T15:18:20Z,2021-05-13T15:18:20Z,MEMBER,I honestly don't really understand `override` well enough to reason about this at the moment. What would help me at least is to have a set of unit tests for this case: @mathause what exactly do we want it to do that it does not currently do?,"{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,788534915
https://github.com/pydata/xarray/issues/4824#issuecomment-840169948,https://api.github.com/repos/pydata/xarray/issues/4824,840169948,MDEyOklzc3VlQ29tbWVudDg0MDE2OTk0OA==,5635139,2021-05-12T23:35:04Z,2021-05-12T23:35:04Z,MEMBER,Is there a reasonable way of preventing the ambiguity in `override`? Do we need to make the `concat_dim` explicit?,"{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,788534915
https://github.com/pydata/xarray/issues/4824#issuecomment-811309154,https://api.github.com/repos/pydata/xarray/issues/4824,811309154,MDEyOklzc3VlQ29tbWVudDgxMTMwOTE1NA==,10194086,2021-03-31T18:24:19Z,2021-03-31T18:24:19Z,MEMBER,"But then it also overrides when it should concatenate...
```python
import numpy as np
import xarray as xr
data = np.arange(5).reshape(1, 5)
x = np.arange(5)
x_name = ""lat""
da0 = xr.DataArray(data, dims=(""t"", x_name), coords={""t"": [1], x_name: x}).to_dataset(name=""a"")
x = x + 5
da1 = xr.DataArray(data, dims=(""t"", x_name), coords={""t"": [2], x_name: x}).to_dataset(name=""a"")
ds = xr.combine_by_coords((da0, da1), join=""override"")
```
out
```python
Dimensions: (lat: 5, t: 2)
Coordinates:
* lat (lat) int64 0 1 2 3 4
* t (t) int64 1 2
Data variables:
a (t, lat) int64 0 1 2 3 4 0 1 2 3 4
```
Again if we set `x_name = ""y""` the following is returned:
```python
Dimensions: (t: 1, y: 10)
Coordinates:
* t (t) int64 1
* y (y) int64 0 1 2 3 4 5 6 7 8 9
Data variables:
a (t, y) int64 0 1 2 3 4 0 1 2 3 4
```","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,788534915
https://github.com/pydata/xarray/issues/4824#issuecomment-810476560,https://api.github.com/repos/pydata/xarray/issues/4824,810476560,MDEyOklzc3VlQ29tbWVudDgxMDQ3NjU2MA==,35968931,2021-03-30T18:20:16Z,2021-03-30T18:20:16Z,MEMBER,"Thanks @dcherian.
So passing `join='override'` to `combine_by_coords` gives this
```python
Dimensions: (lat: 5, t: 2)
Coordinates:
* lat (lat) int64 0 1 2 3 4
* t (t) int64 1 2
Data variables:
a (t, lat) int64 0 1 2 3 4 0 1 2 3 4
```
which seems fine - that solves issue (1) doesn't it?
Your other example though @mathause, of allowing certain ragged arrays to pass through, presumably we still need a new check of some kind to disallow that?
","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,788534915
https://github.com/pydata/xarray/issues/4824#issuecomment-771684107,https://api.github.com/repos/pydata/xarray/issues/4824,771684107,MDEyOklzc3VlQ29tbWVudDc3MTY4NDEwNw==,2448579,2021-02-02T14:44:45Z,2021-02-02T14:45:04Z,MEMBER,"this is the result of `join=""outer""` which expands both datasets, and `compat=""no_conflicts""` which picks the not-NaN value. These are both default values.
You can avoid this by setting `join=""override""` which will skip the check.","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,788534915
https://github.com/pydata/xarray/issues/4824#issuecomment-771583218,https://api.github.com/repos/pydata/xarray/issues/4824,771583218,MDEyOklzc3VlQ29tbWVudDc3MTU4MzIxOA==,35968931,2021-02-02T11:52:57Z,2021-02-02T12:02:14Z,MEMBER,"> we need to figure out if there is a ""bug"" in merge
I don't actually know - this behaviour of `merge` seems wrong to me but it might be allowed given the changes to compatibility checks since I wrote `combine_by_coords`. @dcherian do you know?
> Then we can say: ""if two coords have the same start they need to be equal"" (I think).
Yes exactly. Well, more specifically, ""if they have the same start they need to be equal in length otherwise its a ragged hypercube, and if they have the same start, equal in length, but different values in the middle then it's a valid hypercube but an inconsistent dataset"". It is currently assumed that the user passes a valid hypercube with consistent data, see [this comment](https://github.com/pydata/xarray/blob/5735e163bea43ec9bc3c2e640fbf25a1d4a9d0c0/xarray/core/combine.py#L85) in `_infer_concat_order_from_coords`:
```
# Assume that any two datasets whose coord along dim starts
# with the same value have the same coord values throughout.
```
Though I realise now that I don't think this assumption is made explicit in the docs anywhere, instead it just talks about coords being monotonic.
If people pass ""dodgy"" hypercubes then this could currently fail in multiple ways (including silently), but the reason we didn't just actually check that the coords were completely equal throughout was because then you have to load all the actual values from the files, which could incur a significant performance cost. Adding a check of just the last value of each coord would help considerably (it should solve #4077), but unless we check every value then there will always be a way to silently produce a nonsense result by feeding it inconsistent data. We might consider some kind of flag for whether or not these checks should be done, which defaults to on, and users can turn off if they trust their data but want more speed.","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,788534915
https://github.com/pydata/xarray/issues/4824#issuecomment-771530648,https://api.github.com/repos/pydata/xarray/issues/4824,771530648,MDEyOklzc3VlQ29tbWVudDc3MTUzMDY0OA==,10194086,2021-02-02T10:19:21Z,2021-02-02T10:19:21Z,MEMBER,"Thanks for the thorough answer! Thus, we need to figure out if there is a ""bug"" in `merge` or if this has to be soved in `combine_by_coords`.
Disallowing ""ragged hypercubes"" certainly makes the problem much easier. Then we can say: ""if two coords have the same start they need to be equal"" (I think).","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,788534915
https://github.com/pydata/xarray/issues/4824#issuecomment-770975217,https://api.github.com/repos/pydata/xarray/issues/4824,770975217,MDEyOklzc3VlQ29tbWVudDc3MDk3NTIxNw==,35968931,2021-02-01T16:18:26Z,2021-02-01T16:18:26Z,MEMBER,"tl;dr, I currently think there are two issues:
1) `xr.Merge` has just allowed something through when it should have thrown an error,
2) Passing a sort of ragged hypercube can slip past the checks in `combine_by_coords`, though it was never intended to handle this case. This limitation also isn't very clearly documented.
","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,788534915
https://github.com/pydata/xarray/issues/4824#issuecomment-770973948,https://api.github.com/repos/pydata/xarray/issues/4824,770973948,MDEyOklzc3VlQ29tbWVudDc3MDk3Mzk0OA==,35968931,2021-02-01T16:16:38Z,2021-02-01T16:16:38Z,MEMBER,"Thanks for these examples @mathause , these are useful.
> combine_by_coords does a merge and a concat (in the _combine_1d function). The order of these operations makes a difference to the result. See details.
Not sure if this is what you meant, but to be clear: `_combine_1d` is used by both `combine_by_coords` and `combine_nested` - whether it does a merge or a concat is dictated by the `concat_dim` argument passed. In both `combine_by_coords` and `combine_nested` a hypercube (or multiple hypercubes) of datasets is assembled, before the hypercubes are each collapsed into a single dataset by `_combine_nd`. Within `combine_by_coords` `_combine_1d` will only ever do concatenation, because `_infer_concat_order_from_coords` will only ever return a list of real dimensions as the `concat_dims` (as opposed to a `None` like `combine_nested` can accept from the user - a `None` means ""please merge along this axis of the hypercube I gave you""). `combine_by_coords` then finishes off by merging the possible multiple collapsed hypercubes (so following the same logic as the original 1D `auto_combine` that it replaced).
> merge leads to interlaced coords
> concat leads to - well - concatenated coords
As far I can see here then `concat` is behaving perfectly sensibly, but `merge` is behaving in a way that conflicts with its documentation, which says ""xarray.MergeError is raised if you attempt to merge two variables with the same name but different values"". In `ds0` and `ds1` `lat` has the same name but slightly different values - a MergeError should have been thrown, and so then `combine_*` would have thrown it too.
> One question on the scope of combine_by_coords ...
> aabbb
cccdd
This is a good question, but I'm 99% sure I didn't intend for either combine function to be able to handle this case. The problem with this case is that it's not order-invariant: you could concat `aabbb` together along x fine, and `cccdd` together fine, then both the results can be concatenated together along y (i.e. concat_dims=['x', 'y'], x then y). But if you tried y first then `a` and `c` have different sizes along the x direction, so you can't use concat on them (or you shouldn't be able to without introducing NaNs). It's possible that the checks are not sufficient to catch this case though.
Try comparing it to the behaviour of `combine_nested`: I think you'll find that it's only possible to arrange those datasets into the list-of-lists hypercube structure that's expected when you are doing x first then y, and not when doing y first then x.","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,788534915
https://github.com/pydata/xarray/issues/4824#issuecomment-770848655,https://api.github.com/repos/pydata/xarray/issues/4824,770848655,MDEyOklzc3VlQ29tbWVudDc3MDg0ODY1NQ==,10194086,2021-02-01T13:15:18Z,2021-02-01T13:15:18Z,MEMBER,"`combine_by_coords` does a `merge` and a `concat` (in the `_combine_1d` [function](https://github.com/pydata/xarray/blob/5735e163bea43ec9bc3c2e640fbf25a1d4a9d0c0/xarray/core/combine.py#L240)). The order of these operations makes a difference to the result. See details.
merge leads to interlaced coords
```python
In [2]: xr.merge([da0, da1])
Out[2]:
Dimensions: (lat: 10, t: 2)
Coordinates:
* lat (lat) float64 0.0 1e-06 1.0 1.0 2.0 2.0 3.0 3.0 4.0 4.0
* t (t) int64 1 2
Data variables:
a (t, lat) float64 0.0 nan 1.0 nan 2.0 nan ... 2.0 nan 3.0 nan 4.0
```
concat leads to - well - concatenated coords
```python
xr.concat([da0, da1], dim=""lat"")
Out[5]:
Dimensions: (lat: 10, t: 2)
Coordinates:
* t (t) int64 1 2
* lat (lat) float64 0.0 1.0 2.0 3.0 4.0 1e-06 1.0 2.0 3.0 4.0
Data variables:
a (t, lat) float64 0.0 1.0 2.0 3.0 4.0 nan ... 0.0 1.0 2.0 3.0 4.0
```
Yes I think I'd be interested to fix this...
---
One question on the scope of `combine_by_coords` - consider the following example with 4 arrays (""a"" through ""d"") arranged as follows:
```
aabbb
cccdd
```
Or as `Dataset`:
```python
ds_a = xr.Dataset({""data"": ((""y"", ""x""), [[1, 2]]), ""x"": [0, 1], ""y"": [0]})
ds_b = xr.Dataset({""data"": ((""y"", ""x""), [[3, 4, 5]]), ""x"": [2, 3, 4], ""y"": [0]})
ds_c = xr.Dataset({""data"": ((""y"", ""x""), [[11, 12, 14]]), ""x"": [0, 1, 2], ""y"": [1]})
ds_d = xr.Dataset({""data"": ((""y"", ""x""), [[14, 15]]), ""x"": [3, 4], ""y"": [1]})
xr.combine_by_coords([ds_a, ds_b, ds_c, ds_d]).data
```
Should that work or not?
**Current behaviour**
Currently it yields the following:
```python
array([[ 1, 2, 3, 4, 5],
[11, 12, 14, 14, 15]])
Coordinates:
* x (x) int64 0 1 2 3 4
* y (y) int64 0 1
```
However, if `""x""` is renamed to `""z""` it fails (`""resulting coords not monotonic""`).
","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,788534915
https://github.com/pydata/xarray/issues/4824#issuecomment-770469187,https://api.github.com/repos/pydata/xarray/issues/4824,770469187,MDEyOklzc3VlQ29tbWVudDc3MDQ2OTE4Nw==,35968931,2021-01-31T23:15:15Z,2021-01-31T23:15:15Z,MEMBER,"Thanks @mathause . I'm not sure exactly how it ends up with that erroneous result, but I think it should be caught by adding the same check that would fix #4077? i.e. when it realises that 4 is not < 1e-06 it would throw an error.","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,788534915