issue_comments: 1285855378

This data as json

html_url	issue_url	id	node_id	user	created_at	updated_at	author_association	body	reactions	performed_via_github_app	issue
https://github.com/pydata/xarray/issues/7189#issuecomment-1285855378	https://api.github.com/repos/pydata/xarray/issues/7189	1285855378	IC_kwDOAMm_X85MpJiS	35968931	2022-10-20T16:40:56Z	2022-10-20T16:40:56Z	MEMBER	Thanks so much for the MVCE, sleuthing, and clear issue @mjwillson ! As far as I can tell this happens because `indexes.is_monotonic_increasing` or `indexes.is_monotonic_decreasing` are not checking for strict monotonicity and allow consecutive values to be the same. That sounds likely to me. I assume it wasn't intentional to allow overlaps like this. Definitely not intentional. I'm not sure this case even occurred to me when I originally wrote `combine_by_coords`. If so, do you think anyone is depending on this (I'd hope not...) I hope not too because it's sort of an undefined operation. The purpose of `combine_by_coords` is to order datasets when their coordinates give an unambiguous ordering of all data points. The case you've highlighted isn't really an unambigous ordering of the data points. We do also state clearly in the docstring "`Will attempt to order the datasets such that the values in their dimension coordinates are monotonic along all dimensions. If it cannot determine the order in which to concatenate the datasets, it will raise a ValueError.`" This is just making that stricter to say "monotonic and unique". and would you take a PR to fix it to produce a `ValueError` in this case? A PR would be most welcome :smile: We should also consider the behaviour in the case of repeated elements in the coordinates, e.g. this currently happens: `python a = xarray.DataArray(dims=('x',), data=np.ones((2,)), coords={'x': [0, 0]}) b = xarray.DataArray(dims=('x',), data=np.ones((2,)), coords={'x': [1, 1]}) xarray.combine_by_coords([a, b])` `Out[11]: <xarray.DataArray (x: 4)> array([1., 1., 1., 1.]) Coordinates: * x (x) int64 0 0 1 1` I think it's more likely that someone is relying on that behaviour though... :confused: If this behaviour is intentional or relied upon, could we have an option to do a strict check instead? Alternatively we might consider first changing the code to raise a warning if the resulting coordinate is monotonic but non-unique, so that when we change the actual behaviour later people are ready. Also for performance reasons I'd propose to do some extra upfront checks to catch index overlap (e.g. by checking for index overlap in _infer_concat_order_from_coords), rather than doing a potentially large concat and only detecting duplicates afterwards. That sounds sensible. As an aside, I wonder if we could create a hypothesis test for `combine_by_coords`, where we assert that any dataset that is split then combined should remain unchanged... xref https://github.com/pydata/xarray/pull/6908 and https://github.com/pydata/xarray/issues/1846	{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }		1415592761