home / github / issue_comments

Menu
  • Search all tables
  • GraphQL API

issue_comments: 771583218

This data as json

html_url issue_url id node_id user created_at updated_at author_association body reactions performed_via_github_app issue
https://github.com/pydata/xarray/issues/4824#issuecomment-771583218 https://api.github.com/repos/pydata/xarray/issues/4824 771583218 MDEyOklzc3VlQ29tbWVudDc3MTU4MzIxOA== 35968931 2021-02-02T11:52:57Z 2021-02-02T12:02:14Z MEMBER

we need to figure out if there is a "bug" in merge

I don't actually know - this behaviour of merge seems wrong to me but it might be allowed given the changes to compatibility checks since I wrote combine_by_coords. @dcherian do you know?

Then we can say: "if two coords have the same start they need to be equal" (I think).

Yes exactly. Well, more specifically, "if they have the same start they need to be equal in length otherwise its a ragged hypercube, and if they have the same start, equal in length, but different values in the middle then it's a valid hypercube but an inconsistent dataset". It is currently assumed that the user passes a valid hypercube with consistent data, see this comment in _infer_concat_order_from_coords: ```

Assume that any two datasets whose coord along dim starts

with the same value have the same coord values throughout.

``` Though I realise now that I don't think this assumption is made explicit in the docs anywhere, instead it just talks about coords being monotonic.

If people pass "dodgy" hypercubes then this could currently fail in multiple ways (including silently), but the reason we didn't just actually check that the coords were completely equal throughout was because then you have to load all the actual values from the files, which could incur a significant performance cost. Adding a check of just the last value of each coord would help considerably (it should solve #4077), but unless we check every value then there will always be a way to silently produce a nonsense result by feeding it inconsistent data. We might consider some kind of flag for whether or not these checks should be done, which defaults to on, and users can turn off if they trust their data but want more speed.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  788534915
Powered by Datasette · Queries took 0.595ms · About: xarray-datasette