home / github

Menu
  • GraphQL API
  • Search all tables

issue_comments

Table actions
  • GraphQL API for issue_comments

15 rows where issue = 788534915 sorted by updated_at descending

✎ View and edit SQL

This data as json, CSV (advanced)

Suggested facets: created_at (date), updated_at (date)

user 4

  • TomNicholas 7
  • mathause 4
  • dcherian 3
  • max-sixty 1

issue 1

  • combine_by_coords can succed when it shouldn't · 15 ✖

author_association 1

  • MEMBER 15
id html_url issue_url node_id user created_at updated_at ▲ author_association body reactions performed_via_github_app issue
841244252 https://github.com/pydata/xarray/issues/4824#issuecomment-841244252 https://api.github.com/repos/pydata/xarray/issues/4824 MDEyOklzc3VlQ29tbWVudDg0MTI0NDI1Mg== mathause 10194086 2021-05-14T13:28:50Z 2021-05-14T13:28:50Z MEMBER

I add the example from https://github.com/pydata/xarray/pull/4753#discussion_r631399543 here. I am not 100% if that is the same problem or not:

```python def combine_by_coords_problem(name, join="override"):

ds0 = xr.Dataset(coords={"x1": [1, 2, 3], name: [10, 20, 30]})
ds1 = xr.Dataset(coords={"x1": [4, 5, 6], name: [40, 50, 60]})

return xr.combine_by_coords([ds0, ds1], join=join)

combine_by_coords_problem("x0") # concatenates 1, 2, 3, 4, 5, 6 combine_by_coords_problem("x2") # concatenates 10, 20, 30, 40, 50, 60 ```

with join= - "exact": error - "left", "right", "inner", "override": ambiguous result - "outer": sensible result

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  combine_by_coords can succed when it shouldn't 788534915
840687993 https://github.com/pydata/xarray/issues/4824#issuecomment-840687993 https://api.github.com/repos/pydata/xarray/issues/4824 MDEyOklzc3VlQ29tbWVudDg0MDY4Nzk5Mw== dcherian 2448579 2021-05-13T16:47:56Z 2021-05-13T16:47:56Z MEMBER

Does that mean we should just not to allow join='override' for combine_by_coords?

I think so.

I suspect the original problem can only be solved by allowing approximate alignment within a specified tolerance.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  combine_by_coords can succed when it shouldn't 788534915
840680285 https://github.com/pydata/xarray/issues/4824#issuecomment-840680285 https://api.github.com/repos/pydata/xarray/issues/4824 MDEyOklzc3VlQ29tbWVudDg0MDY4MDI4NQ== TomNicholas 35968931 2021-05-13T16:35:16Z 2021-05-13T16:35:16Z MEMBER

Thanks @dcherian that helps. It does seem silly to be like "I'm going to use the coordinates to decide how everything should be concatenated, but I'm also happy to change some of those coordinate values whilst I'm combining". Does that mean we should just not to allow join='override' for combine_by_coords? I don't think that solves @mathause 's original problem though.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  combine_by_coords can succed when it shouldn't 788534915
840675463 https://github.com/pydata/xarray/issues/4824#issuecomment-840675463 https://api.github.com/repos/pydata/xarray/issues/4824 MDEyOklzc3VlQ29tbWVudDg0MDY3NTQ2Mw== dcherian 2448579 2021-05-13T16:27:15Z 2021-05-13T16:27:38Z MEMBER

For concat, join applies to all indexes that are not being concatenated (e.g. concat over time but override floating point differences in latitude). This is easy to understand for combine_nested.

I don't think it makes sense at all for combine_by_coords. If you want to override coordinate labels values for some dimensions you should have to do it manually. It seems easy to land up in weird edge-cases when you combine that with autoguessing behaviour.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  combine_by_coords can succed when it shouldn't 788534915
840632567 https://github.com/pydata/xarray/issues/4824#issuecomment-840632567 https://api.github.com/repos/pydata/xarray/issues/4824 MDEyOklzc3VlQ29tbWVudDg0MDYzMjU2Nw== TomNicholas 35968931 2021-05-13T15:18:20Z 2021-05-13T15:18:20Z MEMBER

I honestly don't really understand override well enough to reason about this at the moment. What would help me at least is to have a set of unit tests for this case: @mathause what exactly do we want it to do that it does not currently do?

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  combine_by_coords can succed when it shouldn't 788534915
840169948 https://github.com/pydata/xarray/issues/4824#issuecomment-840169948 https://api.github.com/repos/pydata/xarray/issues/4824 MDEyOklzc3VlQ29tbWVudDg0MDE2OTk0OA== max-sixty 5635139 2021-05-12T23:35:04Z 2021-05-12T23:35:04Z MEMBER

Is there a reasonable way of preventing the ambiguity in override? Do we need to make the concat_dim explicit?

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  combine_by_coords can succed when it shouldn't 788534915
811309154 https://github.com/pydata/xarray/issues/4824#issuecomment-811309154 https://api.github.com/repos/pydata/xarray/issues/4824 MDEyOklzc3VlQ29tbWVudDgxMTMwOTE1NA== mathause 10194086 2021-03-31T18:24:19Z 2021-03-31T18:24:19Z MEMBER

But then it also overrides when it should concatenate...

```python import numpy as np import xarray as xr

data = np.arange(5).reshape(1, 5) x = np.arange(5) x_name = "lat"

da0 = xr.DataArray(data, dims=("t", x_name), coords={"t": [1], x_name: x}).to_dataset(name="a") x = x + 5 da1 = xr.DataArray(data, dims=("t", x_name), coords={"t": [2], x_name: x}).to_dataset(name="a") ds = xr.combine_by_coords((da0, da1), join="override")

outpython <xarray.Dataset> Dimensions: (lat: 5, t: 2) Coordinates: * lat (lat) int64 0 1 2 3 4 * t (t) int64 1 2 Data variables: a (t, lat) int64 0 1 2 3 4 0 1 2 3 4 ```

Again if we set x_name = "y" the following is returned:

```python <xarray.Dataset> Dimensions: (t: 1, y: 10) Coordinates: * t (t) int64 1 * y (y) int64 0 1 2 3 4 5 6 7 8 9 Data variables: a (t, y) int64 0 1 2 3 4 0 1 2 3 4

```

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  combine_by_coords can succed when it shouldn't 788534915
810476560 https://github.com/pydata/xarray/issues/4824#issuecomment-810476560 https://api.github.com/repos/pydata/xarray/issues/4824 MDEyOklzc3VlQ29tbWVudDgxMDQ3NjU2MA== TomNicholas 35968931 2021-03-30T18:20:16Z 2021-03-30T18:20:16Z MEMBER

Thanks @dcherian.

So passing join='override' to combine_by_coords gives this

python <xarray.Dataset> Dimensions: (lat: 5, t: 2) Coordinates: * lat (lat) int64 0 1 2 3 4 * t (t) int64 1 2 Data variables: a (t, lat) int64 0 1 2 3 4 0 1 2 3 4 which seems fine - that solves issue (1) doesn't it?

Your other example though @mathause, of allowing certain ragged arrays to pass through, presumably we still need a new check of some kind to disallow that?

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  combine_by_coords can succed when it shouldn't 788534915
771684107 https://github.com/pydata/xarray/issues/4824#issuecomment-771684107 https://api.github.com/repos/pydata/xarray/issues/4824 MDEyOklzc3VlQ29tbWVudDc3MTY4NDEwNw== dcherian 2448579 2021-02-02T14:44:45Z 2021-02-02T14:45:04Z MEMBER

this is the result of join="outer" which expands both datasets, and compat="no_conflicts" which picks the not-NaN value. These are both default values.

You can avoid this by setting join="override" which will skip the check.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  combine_by_coords can succed when it shouldn't 788534915
771583218 https://github.com/pydata/xarray/issues/4824#issuecomment-771583218 https://api.github.com/repos/pydata/xarray/issues/4824 MDEyOklzc3VlQ29tbWVudDc3MTU4MzIxOA== TomNicholas 35968931 2021-02-02T11:52:57Z 2021-02-02T12:02:14Z MEMBER

we need to figure out if there is a "bug" in merge

I don't actually know - this behaviour of merge seems wrong to me but it might be allowed given the changes to compatibility checks since I wrote combine_by_coords. @dcherian do you know?

Then we can say: "if two coords have the same start they need to be equal" (I think).

Yes exactly. Well, more specifically, "if they have the same start they need to be equal in length otherwise its a ragged hypercube, and if they have the same start, equal in length, but different values in the middle then it's a valid hypercube but an inconsistent dataset". It is currently assumed that the user passes a valid hypercube with consistent data, see this comment in _infer_concat_order_from_coords: ```

Assume that any two datasets whose coord along dim starts

with the same value have the same coord values throughout.

``` Though I realise now that I don't think this assumption is made explicit in the docs anywhere, instead it just talks about coords being monotonic.

If people pass "dodgy" hypercubes then this could currently fail in multiple ways (including silently), but the reason we didn't just actually check that the coords were completely equal throughout was because then you have to load all the actual values from the files, which could incur a significant performance cost. Adding a check of just the last value of each coord would help considerably (it should solve #4077), but unless we check every value then there will always be a way to silently produce a nonsense result by feeding it inconsistent data. We might consider some kind of flag for whether or not these checks should be done, which defaults to on, and users can turn off if they trust their data but want more speed.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  combine_by_coords can succed when it shouldn't 788534915
771530648 https://github.com/pydata/xarray/issues/4824#issuecomment-771530648 https://api.github.com/repos/pydata/xarray/issues/4824 MDEyOklzc3VlQ29tbWVudDc3MTUzMDY0OA== mathause 10194086 2021-02-02T10:19:21Z 2021-02-02T10:19:21Z MEMBER

Thanks for the thorough answer! Thus, we need to figure out if there is a "bug" in merge or if this has to be soved in combine_by_coords.

Disallowing "ragged hypercubes" certainly makes the problem much easier. Then we can say: "if two coords have the same start they need to be equal" (I think).

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  combine_by_coords can succed when it shouldn't 788534915
770975217 https://github.com/pydata/xarray/issues/4824#issuecomment-770975217 https://api.github.com/repos/pydata/xarray/issues/4824 MDEyOklzc3VlQ29tbWVudDc3MDk3NTIxNw== TomNicholas 35968931 2021-02-01T16:18:26Z 2021-02-01T16:18:26Z MEMBER

tl;dr, I currently think there are two issues:

1) xr.Merge has just allowed something through when it should have thrown an error, 2) Passing a sort of ragged hypercube can slip past the checks in combine_by_coords, though it was never intended to handle this case. This limitation also isn't very clearly documented.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  combine_by_coords can succed when it shouldn't 788534915
770973948 https://github.com/pydata/xarray/issues/4824#issuecomment-770973948 https://api.github.com/repos/pydata/xarray/issues/4824 MDEyOklzc3VlQ29tbWVudDc3MDk3Mzk0OA== TomNicholas 35968931 2021-02-01T16:16:38Z 2021-02-01T16:16:38Z MEMBER

Thanks for these examples @mathause , these are useful.

combine_by_coords does a merge and a concat (in the _combine_1d function). The order of these operations makes a difference to the result. See details.

Not sure if this is what you meant, but to be clear: _combine_1d is used by both combine_by_coords and combine_nested - whether it does a merge or a concat is dictated by the concat_dim argument passed. In both combine_by_coords and combine_nested a hypercube (or multiple hypercubes) of datasets is assembled, before the hypercubes are each collapsed into a single dataset by _combine_nd. Within combine_by_coords _combine_1d will only ever do concatenation, because _infer_concat_order_from_coords will only ever return a list of real dimensions as the concat_dims (as opposed to a None like combine_nested can accept from the user - a None means "please merge along this axis of the hypercube I gave you"). combine_by_coords then finishes off by merging the possible multiple collapsed hypercubes (so following the same logic as the original 1D auto_combine that it replaced).

merge leads to interlaced coords concat leads to - well - concatenated coords

As far I can see here then concat is behaving perfectly sensibly, but merge is behaving in a way that conflicts with its documentation, which says "xarray.MergeError is raised if you attempt to merge two variables with the same name but different values". In ds0 and ds1 lat has the same name but slightly different values - a MergeError should have been thrown, and so then combine_* would have thrown it too.

One question on the scope of combine_by_coords ... aabbb cccdd

This is a good question, but I'm 99% sure I didn't intend for either combine function to be able to handle this case. The problem with this case is that it's not order-invariant: you could concat aabbb together along x fine, and cccdd together fine, then both the results can be concatenated together along y (i.e. concat_dims=['x', 'y'], x then y). But if you tried y first then a and c have different sizes along the x direction, so you can't use concat on them (or you shouldn't be able to without introducing NaNs). It's possible that the checks are not sufficient to catch this case though.

Try comparing it to the behaviour of combine_nested: I think you'll find that it's only possible to arrange those datasets into the list-of-lists hypercube structure that's expected when you are doing x first then y, and not when doing y first then x.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  combine_by_coords can succed when it shouldn't 788534915
770848655 https://github.com/pydata/xarray/issues/4824#issuecomment-770848655 https://api.github.com/repos/pydata/xarray/issues/4824 MDEyOklzc3VlQ29tbWVudDc3MDg0ODY1NQ== mathause 10194086 2021-02-01T13:15:18Z 2021-02-01T13:15:18Z MEMBER

combine_by_coords does a merge and a concat (in the _combine_1d function). The order of these operations makes a difference to the result. See details.

merge leads to interlaced coords ```python In [2]: xr.merge([da0, da1]) Out[2]: <xarray.Dataset> Dimensions: (lat: 10, t: 2) Coordinates: * lat (lat) float64 0.0 1e-06 1.0 1.0 2.0 2.0 3.0 3.0 4.0 4.0 * t (t) int64 1 2 Data variables: a (t, lat) float64 0.0 nan 1.0 nan 2.0 nan ... 2.0 nan 3.0 nan 4.0 ``` concat leads to - well - concatenated coords ```python xr.concat([da0, da1], dim="lat") Out[5]: <xarray.Dataset> Dimensions: (lat: 10, t: 2) Coordinates: * t (t) int64 1 2 * lat (lat) float64 0.0 1.0 2.0 3.0 4.0 1e-06 1.0 2.0 3.0 4.0 Data variables: a (t, lat) float64 0.0 1.0 2.0 3.0 4.0 nan ... 0.0 1.0 2.0 3.0 4.0 ```

Yes I think I'd be interested to fix this...


One question on the scope of combine_by_coords - consider the following example with 4 arrays ("a" through "d") arranged as follows:

aabbb cccdd

Or as Dataset:

```python ds_a = xr.Dataset({"data": (("y", "x"), [[1, 2]]), "x": [0, 1], "y": [0]}) ds_b = xr.Dataset({"data": (("y", "x"), [[3, 4, 5]]), "x": [2, 3, 4], "y": [0]})

ds_c = xr.Dataset({"data": (("y", "x"), [[11, 12, 14]]), "x": [0, 1, 2], "y": [1]}) ds_d = xr.Dataset({"data": (("y", "x"), [[14, 15]]), "x": [3, 4], "y": [1]})

xr.combine_by_coords([ds_a, ds_b, ds_c, ds_d]).data ```

Should that work or not?

Current behaviour

Currently it yields the following:

python <xarray.DataArray 'data' (y: 2, x: 5)> array([[ 1, 2, 3, 4, 5], [11, 12, 14, 14, 15]]) Coordinates: * x (x) int64 0 1 2 3 4 * y (y) int64 0 1

However, if "x" is renamed to "z" it fails ("resulting coords not monotonic").

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  combine_by_coords can succed when it shouldn't 788534915
770469187 https://github.com/pydata/xarray/issues/4824#issuecomment-770469187 https://api.github.com/repos/pydata/xarray/issues/4824 MDEyOklzc3VlQ29tbWVudDc3MDQ2OTE4Nw== TomNicholas 35968931 2021-01-31T23:15:15Z 2021-01-31T23:15:15Z MEMBER

Thanks @mathause . I'm not sure exactly how it ends up with that erroneous result, but I think it should be caught by adding the same check that would fix #4077? i.e. when it realises that 4 is not < 1e-06 it would throw an error.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  combine_by_coords can succed when it shouldn't 788534915

Advanced export

JSON shape: default, array, newline-delimited, object

CSV options:

CREATE TABLE [issue_comments] (
   [html_url] TEXT,
   [issue_url] TEXT,
   [id] INTEGER PRIMARY KEY,
   [node_id] TEXT,
   [user] INTEGER REFERENCES [users]([id]),
   [created_at] TEXT,
   [updated_at] TEXT,
   [author_association] TEXT,
   [body] TEXT,
   [reactions] TEXT,
   [performed_via_github_app] TEXT,
   [issue] INTEGER REFERENCES [issues]([id])
);
CREATE INDEX [idx_issue_comments_issue]
    ON [issue_comments] ([issue]);
CREATE INDEX [idx_issue_comments_user]
    ON [issue_comments] ([user]);
Powered by Datasette · Queries took 10.263ms · About: xarray-datasette