home / github

Menu
  • GraphQL API
  • Search all tables

issue_comments

Table actions
  • GraphQL API for issue_comments

12 rows where issue = 620514214 sorted by updated_at descending

✎ View and edit SQL

This data as json, CSV (advanced)

Suggested facets: created_at (date), updated_at (date)

user 6

  • malmans2 4
  • mathause 3
  • TomNicholas 2
  • shoyer 1
  • dcherian 1
  • stale[bot] 1

author_association 3

  • MEMBER 7
  • CONTRIBUTOR 4
  • NONE 1

issue 1

  • open_mfdataset overwrites variables with different values but overlapping coordinates · 12 ✖
id html_url issue_url node_id user created_at updated_at ▲ author_association body reactions performed_via_github_app issue
1112152290 https://github.com/pydata/xarray/issues/4077#issuecomment-1112152290 https://api.github.com/repos/pydata/xarray/issues/4077 IC_kwDOAMm_X85CShji stale[bot] 26384082 2022-04-28T12:37:50Z 2022-04-28T12:37:50Z NONE

In order to maintain a list of currently relevant issues, we mark issues as stale after a period of inactivity

If this issue remains relevant, please comment here or remove the stale label; otherwise it will be marked as closed automatically

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  open_mfdataset overwrites variables with different values but overlapping coordinates 620514214
633651638 https://github.com/pydata/xarray/issues/4077#issuecomment-633651638 https://api.github.com/repos/pydata/xarray/issues/4077 MDEyOklzc3VlQ29tbWVudDYzMzY1MTYzOA== malmans2 22245117 2020-05-25T16:54:55Z 2020-05-25T17:49:03Z CONTRIBUTOR

Yup, happy to do it.

Just one doubt. I think in cases where indexes[i][-1] == indexes[i+1][0], the concatenation should be consistent with the compat argument used for merge (not sure if you guys agree on this). I don't know the backend though, so the easiest thing I can think about is to run merge to trigger the exact same checks: python xr.merge([datasets[i].isel(dim=-1), datasets[i+1].isel(dim=0)], compat=compat)

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  open_mfdataset overwrites variables with different values but overlapping coordinates 620514214
633602775 https://github.com/pydata/xarray/issues/4077#issuecomment-633602775 https://api.github.com/repos/pydata/xarray/issues/4077 MDEyOklzc3VlQ29tbWVudDYzMzYwMjc3NQ== TomNicholas 35968931 2020-05-25T14:38:52Z 2020-05-25T14:38:52Z MEMBER

So indexes[i][-1] <= indexes[i+1][0] should work.

@malmans2 are you interested in submitting a pull request to add this? (If not then that's fine!)

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  open_mfdataset overwrites variables with different values but overlapping coordinates 620514214
633586248 https://github.com/pydata/xarray/issues/4077#issuecomment-633586248 https://api.github.com/repos/pydata/xarray/issues/4077 MDEyOklzc3VlQ29tbWVudDYzMzU4NjI0OA== malmans2 22245117 2020-05-25T13:59:18Z 2020-05-25T13:59:18Z CONTRIBUTOR

Nevermind, it looks like if the check goes into _infer_concat_order_from_coords it won't affect combine_nested. So indexes[i][-1] <= indexes[i+1][0] should work.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  open_mfdataset overwrites variables with different values but overlapping coordinates 620514214
633577882 https://github.com/pydata/xarray/issues/4077#issuecomment-633577882 https://api.github.com/repos/pydata/xarray/issues/4077 MDEyOklzc3VlQ29tbWVudDYzMzU3Nzg4Mg== malmans2 22245117 2020-05-25T13:39:37Z 2020-05-25T13:39:37Z CONTRIBUTOR

If indexes[i] = [1, 5] and indexes[i+1] = [2, 3, 4], wouldn't indexes[i][-1] <= indexes[i+1][0] raise an error even if all indexes are different?

What about something like this? I think it would cover all possibilities, but maybe it is too expensive? python if not indexes[0].append(indexes[1:]).is_unique: raise ValueError

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  open_mfdataset overwrites variables with different values but overlapping coordinates 620514214
630912785 https://github.com/pydata/xarray/issues/4077#issuecomment-630912785 https://api.github.com/repos/pydata/xarray/issues/4077 MDEyOklzc3VlQ29tbWVudDYzMDkxMjc4NQ== shoyer 1217238 2020-05-19T15:54:02Z 2020-05-19T15:54:02Z MEMBER

That was actually deliberate, xr.combine_by_coords is only checking the first value of each coord is different, to avoid loading big coordinates into memory. (see this line) As the first y value is 0 in both cases it's just saying "we have a match!" and overwriting.

We already have the coordinates loaded into memory at this point -- each elements of indexes is a pandas.Index.

Looking at the first values makes sense for determining the order, but doesn't guarantee that they are safe to concatenate. The contract of I think we are missing another safety check verifying indexes[i][-1] <= indexes[i+1][0] for all indexes in order, in a way that handles ties correctly.

In my opinion, xarray's combine functions like combine_by_coords should never override values, unless an unsafe option was explicitly chosen.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  open_mfdataset overwrites variables with different values but overlapping coordinates 620514214
630842902 https://github.com/pydata/xarray/issues/4077#issuecomment-630842902 https://api.github.com/repos/pydata/xarray/issues/4077 MDEyOklzc3VlQ29tbWVudDYzMDg0MjkwMg== dcherian 2448579 2020-05-19T14:07:40Z 2020-05-19T14:07:40Z MEMBER

What is the expected outcome here? An error?

The only way I can think of to combine these two datasets without losing data is to do combine_nested([ds0, ds1], concat_dim="new_dim").

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  open_mfdataset overwrites variables with different values but overlapping coordinates 620514214
630779096 https://github.com/pydata/xarray/issues/4077#issuecomment-630779096 https://api.github.com/repos/pydata/xarray/issues/4077 MDEyOklzc3VlQ29tbWVudDYzMDc3OTA5Ng== TomNicholas 35968931 2020-05-19T12:14:41Z 2020-05-19T12:15:51Z MEMBER

Thanks for reporting this @malmans2!

There are actually two issues here: The minor one is that it should never have been possible to specify concat_dim and combine='by_coords' to open_mfdataset simultaneously. You should have got an error already at that point. xr.combine_by_coords doesn't accept a concat_dim argument, so neither should xr.open_mfdataset(..., combine='by_coords').

The more complex issue is that you can get the same overwriting problem in xr.combine_by_coords alone...

That was actually deliberate, xr.combine_by_coords is only checking the first value of each coord is different, to avoid loading big coordinates into memory. (see this line) As the first y value is 0 in both cases it's just saying "we have a match!" and overwriting.

@shoyer we discussed that PR (#2616) extensively, but I can't see an explicit record of discussing that particular line?

But since then @dcherian has done work on the options which vary the strictness of checking - should compat also vary this behaviour?

EDIT: (sorry for repeating what was said above, I wrote this reply last night and sent it today)

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  open_mfdataset overwrites variables with different values but overlapping coordinates 620514214
630721808 https://github.com/pydata/xarray/issues/4077#issuecomment-630721808 https://api.github.com/repos/pydata/xarray/issues/4077 MDEyOklzc3VlQ29tbWVudDYzMDcyMTgwOA== mathause 10194086 2020-05-19T10:06:10Z 2020-05-19T10:06:10Z MEMBER

The second part could probably be tested just below this if:

https://github.com/pydata/xarray/blob/2542a63f6ebed1a464af7fc74b9f3bf302925803/xarray/core/combine.py#L751

using

python if not indexes.is_unique: raise ValueError("") (or a warning)

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  open_mfdataset overwrites variables with different values but overlapping coordinates 620514214
630713011 https://github.com/pydata/xarray/issues/4077#issuecomment-630713011 https://api.github.com/repos/pydata/xarray/issues/4077 MDEyOklzc3VlQ29tbWVudDYzMDcxMzAxMQ== mathause 10194086 2020-05-19T09:47:33Z 2020-05-19T09:47:33Z MEMBER

Raising an error when the start time is equal is certainly a good idea. What I am less sure about is what to do when the end is equal to the start - maybe a warning?

The second case would be the following: python print(ds0) print(ds1) ``` <xarray.Dataset> Dimensions: (x: 2) Coordinates: * x (x) int64 0 1 Data variables: foo ...

<xarray.Dataset> Dimensions: (x: 2) Coordinates: * x (x) int64 1 2 Data variables: foo ... and `auto_combine` would lead to:python xr.combine_by_coords([ds0, ds1]) <xarray.Dataset> Dimensions: (x: 2) Coordinates: * x (x) int64 0 1 1 2 Data variables: foo ... ```

For the first case you can probably check if all elements of order are unique:

https://github.com/pydata/xarray/blob/2542a63f6ebed1a464af7fc74b9f3bf302925803/xarray/core/combine.py#L99

ps: Overlapping indices are not a problem - it is checked that the result is monotonic:

https://github.com/pydata/xarray/blob/2542a63f6ebed1a464af7fc74b9f3bf302925803/xarray/core/combine.py#L748

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  open_mfdataset overwrites variables with different values but overlapping coordinates 620514214
630692045 https://github.com/pydata/xarray/issues/4077#issuecomment-630692045 https://api.github.com/repos/pydata/xarray/issues/4077 MDEyOklzc3VlQ29tbWVudDYzMDY5MjA0NQ== malmans2 22245117 2020-05-19T09:08:59Z 2020-05-19T09:08:59Z CONTRIBUTOR

Got it, Thanks! Let me know if it is worth adding some checks. I'd be happy to work on it.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  open_mfdataset overwrites variables with different values but overlapping coordinates 620514214
630644504 https://github.com/pydata/xarray/issues/4077#issuecomment-630644504 https://api.github.com/repos/pydata/xarray/issues/4077 MDEyOklzc3VlQ29tbWVudDYzMDY0NDUwNA== mathause 10194086 2020-05-19T07:40:11Z 2020-05-19T07:40:11Z MEMBER

Yes, xr.combine_by_coords only ensures that the coordinates are monotonically increasing. It does not check that they (a) don't have the same start (your case) and (b) if the end of ds0 is equal to the start of ds1 (which may also be undesirable).

The magic happens here:

https://github.com/pydata/xarray/blob/2542a63f6ebed1a464af7fc74b9f3bf302925803/xarray/core/combine.py#L49

In your case it just uses the rightmost array (compare xr.combine_by_coords([ds0, ds1]) and xr.combine_by_coords([ds1, ds0]).

(Note that concat_dim="y" is ignored when using combine_by_coords).

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  open_mfdataset overwrites variables with different values but overlapping coordinates 620514214

Advanced export

JSON shape: default, array, newline-delimited, object

CSV options:

CREATE TABLE [issue_comments] (
   [html_url] TEXT,
   [issue_url] TEXT,
   [id] INTEGER PRIMARY KEY,
   [node_id] TEXT,
   [user] INTEGER REFERENCES [users]([id]),
   [created_at] TEXT,
   [updated_at] TEXT,
   [author_association] TEXT,
   [body] TEXT,
   [reactions] TEXT,
   [performed_via_github_app] TEXT,
   [issue] INTEGER REFERENCES [issues]([id])
);
CREATE INDEX [idx_issue_comments_issue]
    ON [issue_comments] ([issue]);
CREATE INDEX [idx_issue_comments_user]
    ON [issue_comments] ([user]);
Powered by Datasette · Queries took 16.649ms · About: xarray-datasette