issues: 446054247
This data as json
id | node_id | number | title | user | state | locked | assignee | milestone | comments | created_at | updated_at | closed_at | author_association | active_lock_reason | draft | pull_request | body | reactions | performed_via_github_app | state_reason | repo | type |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
446054247 | MDU6SXNzdWU0NDYwNTQyNDc= | 2975 | Inconsistent/confusing behaviour when concatenating dimension coords | 35968931 | open | 0 | 2 | 2019-05-20T11:01:37Z | 2021-07-08T17:42:52Z | MEMBER | I noticed that with multiple conflicting dimension coords then concat can give pretty weird/counterintuitive results, at least compared to what the documentation suggests they should give: ```python Create two datasets with conflicting coordinatesobjs = [Dataset({'x': [0], 'y': [1]}), Dataset({'y': [0], 'x': [1]})] [<xarray.Dataset> Dimensions: (x: 1, y: 1) Coordinates: * x (x) int64 0 * y (y) int64 1 Data variables: empty, <xarray.Dataset> Dimensions: (x: 1, y: 1) Coordinates: * y (y) int64 0 * x (x) int64 1 Data variables: empty] ``` ```python Try to join along only 'x',coords='minimal' so concatenate "Only coordinates in which the dimension already appears"concat(objs, dim='x', coords='minimal') <xarray.Dataset> Dimensions: (x: 2, y: 2) Coordinates: * y (y) int64 0 1 * x (x) int64 0 1 Data variables: empty It's joined along x and y!``` Based on my reading of the docstring for concat, I would have expected this to not attempt to concatenate y, because Now let's try to get concat to broadcast 'y' across 'x': ```python Try to join along only 'x' by setting coords='different'concat(objs, dim='x', coords='different') ``` Now as "Data variables which are not equal (ignoring attributes) across all datasets are also concatenated" then I would have expected 'y' to be concatenated across 'x', i.e. to add the 'x' dimension to the 'y' coord, i.e:
Same again but without dimension coordsIf we create the same sort of objects but the variables are data vars not coords, then everything behaves exactly as expected: ```python objs2 = [Dataset({'a': ('x', [0]), 'b': ('y', [1])}), Dataset({'a': ('x', [1]), 'b': ('y', [0])})] [<xarray.Dataset> Dimensions: (x: 1, y: 1) Dimensions without coordinates: x, y Data variables: a (x) int64 0 b (y) int64 1, <xarray.Dataset> Dimensions: (x: 1, y: 1) Dimensions without coordinates: x, y Data variables: a (x) int64 1 b (y) int64 0] concat(objs2, dim='x', data_vars='minimal') ValueError: variable b not equal across datasets concat(objs2, dim='x', data_vars='different') <xarray.Dataset> Dimensions: (x: 2, y: 1) Dimensions without coordinates: x, y Data variables: a (x) int64 0 1 b (x, y) int64 1 0 ``` Also if you do the same again but with coordinates which are not dimension coords, i.e: ```python objs3 = [Dataset(coords={'a': ('x', [0]), 'b': ('y', [1])}), Dataset(coords={'a': ('x', [1]), 'b': ('y', [0])})] [<xarray.Dataset> Dimensions: (x: 1, y: 1) Coordinates: a (x) int64 0 b (y) int64 1 Dimensions without coordinates: x, y Data variables: empty, <xarray.Dataset> Dimensions: (x: 1, y: 1) Coordinates: a (x) int64 1 b (y) int64 0 Dimensions without coordinates: x, y Data variables: empty] ``` then this again gives the expected concatenation behaviour. So this implies that the compatibility checks that are being done on the data vars are not being done on the coords, but only if they are dimension coordinates! Either this is not the desired behaviour or the concat docstring needs to be a lot clearer. If we agree that this is not the desired behaviour then I will have a look inside EDIT: Presumably this has something to do with the ToDo in the code for |
{ "url": "https://api.github.com/repos/pydata/xarray/issues/2975/reactions", "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
13221727 | issue |