home / github / issue_comments

Menu
  • GraphQL API
  • Search all tables

issue_comments: 770973948

This data as json

html_url issue_url id node_id user created_at updated_at author_association body reactions performed_via_github_app issue
https://github.com/pydata/xarray/issues/4824#issuecomment-770973948 https://api.github.com/repos/pydata/xarray/issues/4824 770973948 MDEyOklzc3VlQ29tbWVudDc3MDk3Mzk0OA== 35968931 2021-02-01T16:16:38Z 2021-02-01T16:16:38Z MEMBER

Thanks for these examples @mathause , these are useful.

combine_by_coords does a merge and a concat (in the _combine_1d function). The order of these operations makes a difference to the result. See details.

Not sure if this is what you meant, but to be clear: _combine_1d is used by both combine_by_coords and combine_nested - whether it does a merge or a concat is dictated by the concat_dim argument passed. In both combine_by_coords and combine_nested a hypercube (or multiple hypercubes) of datasets is assembled, before the hypercubes are each collapsed into a single dataset by _combine_nd. Within combine_by_coords _combine_1d will only ever do concatenation, because _infer_concat_order_from_coords will only ever return a list of real dimensions as the concat_dims (as opposed to a None like combine_nested can accept from the user - a None means "please merge along this axis of the hypercube I gave you"). combine_by_coords then finishes off by merging the possible multiple collapsed hypercubes (so following the same logic as the original 1D auto_combine that it replaced).

merge leads to interlaced coords concat leads to - well - concatenated coords

As far I can see here then concat is behaving perfectly sensibly, but merge is behaving in a way that conflicts with its documentation, which says "xarray.MergeError is raised if you attempt to merge two variables with the same name but different values". In ds0 and ds1 lat has the same name but slightly different values - a MergeError should have been thrown, and so then combine_* would have thrown it too.

One question on the scope of combine_by_coords ... aabbb cccdd

This is a good question, but I'm 99% sure I didn't intend for either combine function to be able to handle this case. The problem with this case is that it's not order-invariant: you could concat aabbb together along x fine, and cccdd together fine, then both the results can be concatenated together along y (i.e. concat_dims=['x', 'y'], x then y). But if you tried y first then a and c have different sizes along the x direction, so you can't use concat on them (or you shouldn't be able to without introducing NaNs). It's possible that the checks are not sufficient to catch this case though.

Try comparing it to the behaviour of combine_nested: I think you'll find that it's only possible to arrange those datasets into the list-of-lists hypercube structure that's expected when you are doing x first then y, and not when doing y first then x.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  788534915
Powered by Datasette · Queries took 0.694ms · About: xarray-datasette