html_url,issue_url,id,node_id,user,created_at,updated_at,author_association,body,reactions,performed_via_github_app,issue https://github.com/pydata/xarray/issues/4077#issuecomment-1112152290,https://api.github.com/repos/pydata/xarray/issues/4077,1112152290,IC_kwDOAMm_X85CShji,26384082,2022-04-28T12:37:50Z,2022-04-28T12:37:50Z,NONE,"In order to maintain a list of currently relevant issues, we mark issues as stale after a period of inactivity If this issue remains relevant, please comment here or remove the `stale` label; otherwise it will be marked as closed automatically ","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,620514214 https://github.com/pydata/xarray/issues/4077#issuecomment-633651638,https://api.github.com/repos/pydata/xarray/issues/4077,633651638,MDEyOklzc3VlQ29tbWVudDYzMzY1MTYzOA==,22245117,2020-05-25T16:54:55Z,2020-05-25T17:49:03Z,CONTRIBUTOR,"Yup, happy to do it. Just one doubt. I think in cases where `indexes[i][-1] == indexes[i+1][0]`, the concatenation should be consistent with the `compat` argument used for `merge` (not sure if you guys agree on this). I don't know the backend though, so the easiest thing I can think about is to run `merge` to trigger the exact same checks: ```python xr.merge([datasets[i].isel(dim=-1), datasets[i+1].isel(dim=0)], compat=compat) ```","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,620514214 https://github.com/pydata/xarray/issues/4077#issuecomment-633602775,https://api.github.com/repos/pydata/xarray/issues/4077,633602775,MDEyOklzc3VlQ29tbWVudDYzMzYwMjc3NQ==,35968931,2020-05-25T14:38:52Z,2020-05-25T14:38:52Z,MEMBER,"> So indexes[i][-1] <= indexes[i+1][0] should work. @malmans2 are you interested in submitting a pull request to add this? (If not then that's fine!)","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,620514214 https://github.com/pydata/xarray/issues/4077#issuecomment-633586248,https://api.github.com/repos/pydata/xarray/issues/4077,633586248,MDEyOklzc3VlQ29tbWVudDYzMzU4NjI0OA==,22245117,2020-05-25T13:59:18Z,2020-05-25T13:59:18Z,CONTRIBUTOR,"Nevermind, it looks like if the check goes into `_infer_concat_order_from_coords` it won't affect `combine_nested`. So `indexes[i][-1] <= indexes[i+1][0]` should work.","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,620514214 https://github.com/pydata/xarray/issues/4077#issuecomment-633577882,https://api.github.com/repos/pydata/xarray/issues/4077,633577882,MDEyOklzc3VlQ29tbWVudDYzMzU3Nzg4Mg==,22245117,2020-05-25T13:39:37Z,2020-05-25T13:39:37Z,CONTRIBUTOR,"If `indexes[i] = [1, 5]` and `indexes[i+1] = [2, 3, 4]`, wouldn't `indexes[i][-1] <= indexes[i+1][0]` raise an error even if all indexes are different? What about something like this? I think it would cover all possibilities, but maybe it is too expensive? ```python if not indexes[0].append(indexes[1:]).is_unique: raise ValueError ```","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,620514214 https://github.com/pydata/xarray/issues/4077#issuecomment-630912785,https://api.github.com/repos/pydata/xarray/issues/4077,630912785,MDEyOklzc3VlQ29tbWVudDYzMDkxMjc4NQ==,1217238,2020-05-19T15:54:02Z,2020-05-19T15:54:02Z,MEMBER,"> That was actually deliberate, `xr.combine_by_coords` is only checking the first value of each coord is different, to avoid loading big coordinates into memory. (see [this line](https://github.com/pydata/xarray/blob/2542a63f6ebed1a464af7fc74b9f3bf302925803/xarray/core/combine.py#L89)) As the first y value is 0 in both cases it's just saying ""we have a match!"" and overwriting. We already have the coordinates loaded into memory at this point -- each elements of `indexes` is a `pandas.Index`. Looking at the first values makes sense for determining the order, but doesn't guarantee that they are safe to concatenate. The contract of I think we are missing another safety check verifying `indexes[i][-1] <= indexes[i+1][0]` for all indexes in order, in a way that handles ties correctly. In my opinion, xarray's combine functions like `combine_by_coords` should never override values, unless an unsafe option was explicitly chosen.","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,620514214 https://github.com/pydata/xarray/issues/4077#issuecomment-630842902,https://api.github.com/repos/pydata/xarray/issues/4077,630842902,MDEyOklzc3VlQ29tbWVudDYzMDg0MjkwMg==,2448579,2020-05-19T14:07:40Z,2020-05-19T14:07:40Z,MEMBER,"What is the expected outcome here? An error? The only way I can think of to combine these two datasets without losing data is to do `combine_nested([ds0, ds1], concat_dim=""new_dim"")`.","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,620514214 https://github.com/pydata/xarray/issues/4077#issuecomment-630779096,https://api.github.com/repos/pydata/xarray/issues/4077,630779096,MDEyOklzc3VlQ29tbWVudDYzMDc3OTA5Ng==,35968931,2020-05-19T12:14:41Z,2020-05-19T12:15:51Z,MEMBER,"Thanks for reporting this @malmans2! There are actually two issues here: The minor one is that it should never have been possible to specify `concat_dim` and `combine='by_coords'` to `open_mfdataset` simultaneously. You should have got an error already at that point. `xr.combine_by_coords` doesn't accept a `concat_dim` argument, so neither should `xr.open_mfdataset(..., combine='by_coords')`. The more complex issue is that you can get the same overwriting problem in `xr.combine_by_coords` alone... That was actually deliberate, `xr.combine_by_coords` is only checking the first value of each coord is different, to avoid loading big coordinates into memory. (see [this line]( https://github.com/pydata/xarray/blob/2542a63f6ebed1a464af7fc74b9f3bf302925803/xarray/core/combine.py#L89)) As the first y value is 0 in both cases it's just saying ""we have a match!"" and overwriting. @shoyer we discussed that PR (#2616) extensively, but I can't see an explicit record of discussing that particular line? But since then @dcherian has done work on the options which vary the strictness of checking - should `compat` also vary this behaviour? EDIT: (sorry for repeating what was said above, I wrote this reply last night and sent it today)","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,620514214 https://github.com/pydata/xarray/issues/4077#issuecomment-630721808,https://api.github.com/repos/pydata/xarray/issues/4077,630721808,MDEyOklzc3VlQ29tbWVudDYzMDcyMTgwOA==,10194086,2020-05-19T10:06:10Z,2020-05-19T10:06:10Z,MEMBER,"The second part could probably be tested just below this `if`: https://github.com/pydata/xarray/blob/2542a63f6ebed1a464af7fc74b9f3bf302925803/xarray/core/combine.py#L751 using ```python if not indexes.is_unique: raise ValueError("""") ``` (or a warning) ","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,620514214 https://github.com/pydata/xarray/issues/4077#issuecomment-630713011,https://api.github.com/repos/pydata/xarray/issues/4077,630713011,MDEyOklzc3VlQ29tbWVudDYzMDcxMzAxMQ==,10194086,2020-05-19T09:47:33Z,2020-05-19T09:47:33Z,MEMBER,"Raising an error when the start time is equal is certainly a good idea. What I am less sure about is what to do when the end is equal to the start - maybe a warning? The second case would be the following: ```python print(ds0) print(ds1) ``` ``` Dimensions: (x: 2) Coordinates: * x (x) int64 0 1 Data variables: foo ... Dimensions: (x: 2) Coordinates: * x (x) int64 1 2 Data variables: foo ... ``` and `auto_combine` would lead to: ```python xr.combine_by_coords([ds0, ds1]) ``` ``` Dimensions: (x: 2) Coordinates: * x (x) int64 0 1 1 2 Data variables: foo ... ``` For the first case you can probably check if all elements of `order` are unique: https://github.com/pydata/xarray/blob/2542a63f6ebed1a464af7fc74b9f3bf302925803/xarray/core/combine.py#L99 ps: Overlapping indices are not a problem - it is checked that the result is monotonic: https://github.com/pydata/xarray/blob/2542a63f6ebed1a464af7fc74b9f3bf302925803/xarray/core/combine.py#L748 ","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,620514214 https://github.com/pydata/xarray/issues/4077#issuecomment-630692045,https://api.github.com/repos/pydata/xarray/issues/4077,630692045,MDEyOklzc3VlQ29tbWVudDYzMDY5MjA0NQ==,22245117,2020-05-19T09:08:59Z,2020-05-19T09:08:59Z,CONTRIBUTOR,"Got it, Thanks! Let me know if it is worth adding some checks. I'd be happy to work on it.","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,620514214 https://github.com/pydata/xarray/issues/4077#issuecomment-630644504,https://api.github.com/repos/pydata/xarray/issues/4077,630644504,MDEyOklzc3VlQ29tbWVudDYzMDY0NDUwNA==,10194086,2020-05-19T07:40:11Z,2020-05-19T07:40:11Z,MEMBER,"Yes, `xr.combine_by_coords` only ensures that the coordinates are monotonically increasing. It does not check that they (a) don't have the same start (your case) and (b) if the end of `ds0` is equal to the start of `ds1` (which may also be undesirable). The magic happens here: https://github.com/pydata/xarray/blob/2542a63f6ebed1a464af7fc74b9f3bf302925803/xarray/core/combine.py#L49 In your case it just uses the rightmost array (compare `xr.combine_by_coords([ds0, ds1])` and `xr.combine_by_coords([ds1, ds0])`. (Note that `concat_dim=""y""` is ignored when using `combine_by_coords`). ","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,620514214