home / github / issues

Menu
  • Search all tables
  • GraphQL API

issues: 446054247

This data as json

id node_id number title user state locked assignee milestone comments created_at updated_at closed_at author_association active_lock_reason draft pull_request body reactions performed_via_github_app state_reason repo type
446054247 MDU6SXNzdWU0NDYwNTQyNDc= 2975 Inconsistent/confusing behaviour when concatenating dimension coords 35968931 open 0     2 2019-05-20T11:01:37Z 2021-07-08T17:42:52Z   MEMBER      

I noticed that with multiple conflicting dimension coords then concat can give pretty weird/counterintuitive results, at least compared to what the documentation suggests they should give:

```python

Create two datasets with conflicting coordinates

objs = [Dataset({'x': [0], 'y': [1]}), Dataset({'y': [0], 'x': [1]})]

[<xarray.Dataset> Dimensions: (x: 1, y: 1) Coordinates: * x (x) int64 0 * y (y) int64 1 Data variables: empty, <xarray.Dataset> Dimensions: (x: 1, y: 1) Coordinates: * y (y) int64 0 * x (x) int64 1 Data variables: empty] ```

```python

Try to join along only 'x',

coords='minimal' so concatenate "Only coordinates in which the dimension already appears"

concat(objs, dim='x', coords='minimal')

<xarray.Dataset> Dimensions: (x: 2, y: 2) Coordinates: * y (y) int64 0 1 * x (x) int64 0 1 Data variables: empty

It's joined along x and y!

```

Based on my reading of the docstring for concat, I would have expected this to not attempt to concatenate y, because coords='minimal', and instead to throw an error because 'y' is a "non-concatenated variable" whose values are not the same across datasets.

Now let's try to get concat to broadcast 'y' across 'x':

```python

Try to join along only 'x' by setting coords='different'

concat(objs, dim='x', coords='different') ```

Now as "Data variables which are not equal (ignoring attributes) across all datasets are also concatenated" then I would have expected 'y' to be concatenated across 'x', i.e. to add the 'x' dimension to the 'y' coord, i.e:

python <xarray.Dataset> Dimensions: (x: 2, y: 1) Coordinates: * y (y, x) int64 1 0 * x (x) int64 0 1 Data variables: *empty* But that's not what we get!: <xarray.Dataset> Dimensions: (x: 2, y: 2) Coordinates: * y (y) int64 0 1 * x (x) int64 0 1 Data variables: *empty*

Same again but without dimension coords

If we create the same sort of objects but the variables are data vars not coords, then everything behaves exactly as expected:

```python objs2 = [Dataset({'a': ('x', [0]), 'b': ('y', [1])}), Dataset({'a': ('x', [1]), 'b': ('y', [0])})]

[<xarray.Dataset> Dimensions: (x: 1, y: 1) Dimensions without coordinates: x, y Data variables: a (x) int64 0 b (y) int64 1, <xarray.Dataset> Dimensions: (x: 1, y: 1) Dimensions without coordinates: x, y Data variables: a (x) int64 1 b (y) int64 0]

concat(objs2, dim='x', data_vars='minimal')

ValueError: variable b not equal across datasets

concat(objs2, dim='x', data_vars='different')

<xarray.Dataset> Dimensions: (x: 2, y: 1) Dimensions without coordinates: x, y Data variables: a (x) int64 0 1 b (x, y) int64 1 0 ```

Also if you do the same again but with coordinates which are not dimension coords, i.e:

```python objs3 = [Dataset(coords={'a': ('x', [0]), 'b': ('y', [1])}), Dataset(coords={'a': ('x', [1]), 'b': ('y', [0])})]

[<xarray.Dataset> Dimensions: (x: 1, y: 1) Coordinates: a (x) int64 0 b (y) int64 1 Dimensions without coordinates: x, y Data variables: empty, <xarray.Dataset> Dimensions: (x: 1, y: 1) Coordinates: a (x) int64 1 b (y) int64 0 Dimensions without coordinates: x, y Data variables: empty] ``` then this again gives the expected concatenation behaviour.

So this implies that the compatibility checks that are being done on the data vars are not being done on the coords, but only if they are dimension coordinates!

Either this is not the desired behaviour or the concat docstring needs to be a lot clearer. If we agree that this is not the desired behaviour then I will have a look inside concat to work out why it's happening.

EDIT: Presumably this has something to do with the ToDo in the code for concat: # TODO: support concatenating scalar coordinates even if the concatenated dimension already exists...

{
    "url": "https://api.github.com/repos/pydata/xarray/issues/2975/reactions",
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
    13221727 issue

Links from other tables

  • 1 row from issues_id in issues_labels
  • 2 rows from issue in issue_comments
Powered by Datasette · Queries took 9.155ms · About: xarray-datasette