home / github / issue_comments

Menu
  • GraphQL API
  • Search all tables

issue_comments: 616633079

This data as json

html_url issue_url id node_id user created_at updated_at author_association body reactions performed_via_github_app issue
https://github.com/pydata/xarray/issues/3774#issuecomment-616633079 https://api.github.com/repos/pydata/xarray/issues/3774 616633079 MDEyOklzc3VlQ29tbWVudDYxNjYzMzA3OQ== 35968931 2020-04-20T15:37:10Z 2020-04-20T15:37:10Z MEMBER

Suppose there were multiple scalar coordinates that are unique for each variable. How would combine_by_coords pick a dimension to stack along?

@shoyer it would expand and stack along both, filling the (many) gaps created with NaNs.

```python import xarray as xr

data_0 = xr.Dataset({'temperature': ('time', [10,20,30])}, coords={'time': [0,1,2]}) data_0.coords['trial'] = 0 # scalar coords data_0.coords['day'] = 1

data_1 = xr.Dataset({'temperature': ('time', [50,60,70])}, coords={'time': [0,1,2]}) data_1.coords['trial'] = 1 data_1.coords['day'] = 0

both scalar coords will be promoted to dims

all_trials = xr.combine_by_coords([data_0, data_1]) print(all_trials) <xarray.Dataset> Dimensions: (day: 2, time: 3, trial: 2) Coordinates: * time (time) int64 0 1 2 * trial (trial) int64 0 1 * day (day) int64 0 1 Data variables: temperature (day, trial, time) float64 nan nan nan 50.0 ... nan nan nan The gaps created will be filled in with NaNspython print(all_trials['temperature'].data) [[[nan nan nan] [50. 60. 70.]]

[[10. 20. 30.] [nan nan nan]]] ```

This gap-filling isn't new though - without this PR the same thing already happens with length-1 dimension coords (since PR #3649 - see my comment there)

```python data_0 = xr.Dataset({'temperature': ('time', [10,20,30])}, coords={'time': [0,1,2]}) data_0.coords['trial'] = [0] # 1D dimension coords data_0.coords['day'] = [1]

data_1 = xr.Dataset({'temperature': ('time', [50,60,70])}, coords={'time': [0,1,2]}) data_1.coords['trial'] = [1] data_1.coords['day'] = [0]

all_trials = xr.combine_by_coords([data_0, data_1]) print(all_trials) <xarray.Dataset> Dimensions: (day: 2, time: 3, trial: 2) Coordinates: * time (time) int64 0 1 2 * day (day) int64 0 1 * trial (trial) int64 0 1 Data variables: temperature (trial, day, time) float64 nan nan nan 10.0 ... nan nan nan ```

```python

gaps will again be filled in with NaNs

print(all_trials['temperature'].data) [[[nan nan nan] [10. 20. 30.]]

[[50. 60. 70.] [nan nan nan]]] ```

So all my PR is doing is promoting all scalar coordinates (those which aren't equal across all datasets) to dimension coordinates before combining.

There is a chance this could unwittingly increase the overall size of people's datasets (when they have different scalar coordinates in different datasets), but that could already happen since #3649.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  566490806
Powered by Datasette · Queries took 0.466ms · About: xarray-datasette