home / github / issue_comments

Menu
  • Search all tables
  • GraphQL API

issue_comments: 1285855378

This data as json

html_url issue_url id node_id user created_at updated_at author_association body reactions performed_via_github_app issue
https://github.com/pydata/xarray/issues/7189#issuecomment-1285855378 https://api.github.com/repos/pydata/xarray/issues/7189 1285855378 IC_kwDOAMm_X85MpJiS 35968931 2022-10-20T16:40:56Z 2022-10-20T16:40:56Z MEMBER

Thanks so much for the MVCE, sleuthing, and clear issue @mjwillson !

As far as I can tell this happens because indexes.is_monotonic_increasing or indexes.is_monotonic_decreasing are not checking for strict monotonicity and allow consecutive values to be the same.

That sounds likely to me.

I assume it wasn't intentional to allow overlaps like this.

Definitely not intentional. I'm not sure this case even occurred to me when I originally wrote combine_by_coords.

If so, do you think anyone is depending on this (I'd hope not...)

I hope not too because it's sort of an undefined operation. The purpose of combine_by_coords is to order datasets when their coordinates give an unambiguous ordering of all data points. The case you've highlighted isn't really an unambigous ordering of the data points.

We do also state clearly in the docstring "Will attempt to order the datasets such that the values in their dimension coordinates are monotonic along all dimensions. If it cannot determine the order in which to concatenate the datasets, it will raise a ValueError." This is just making that stricter to say "monotonic and unique".

and would you take a PR to fix it to produce a ValueError in this case?

A PR would be most welcome :smile:

We should also consider the behaviour in the case of repeated elements in the coordinates, e.g. this currently happens:

python a = xarray.DataArray(dims=('x',), data=np.ones((2,)), coords={'x': [0, 0]}) b = xarray.DataArray(dims=('x',), data=np.ones((2,)), coords={'x': [1, 1]}) xarray.combine_by_coords([a, b]) Out[11]: <xarray.DataArray (x: 4)> array([1., 1., 1., 1.]) Coordinates: * x (x) int64 0 0 1 1 I think it's more likely that someone is relying on that behaviour though... :confused:

If this behaviour is intentional or relied upon, could we have an option to do a strict check instead?

Alternatively we might consider first changing the code to raise a warning if the resulting coordinate is monotonic but non-unique, so that when we change the actual behaviour later people are ready.

Also for performance reasons I'd propose to do some extra upfront checks to catch index overlap (e.g. by checking for index overlap in _infer_concat_order_from_coords), rather than doing a potentially large concat and only detecting duplicates afterwards.

That sounds sensible.

As an aside, I wonder if we could create a hypothesis test for combine_by_coords, where we assert that any dataset that is split then combined should remain unchanged... xref https://github.com/pydata/xarray/pull/6908 and https://github.com/pydata/xarray/issues/1846

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  1415592761
Powered by Datasette · Queries took 0.811ms · About: xarray-datasette