issue_comments: 616633079

This data as json

html_url	issue_url	id	node_id	user	created_at	updated_at	author_association	body	reactions	performed_via_github_app	issue
https://github.com/pydata/xarray/issues/3774#issuecomment-616633079	https://api.github.com/repos/pydata/xarray/issues/3774	616633079	MDEyOklzc3VlQ29tbWVudDYxNjYzMzA3OQ==	35968931	2020-04-20T15:37:10Z	2020-04-20T15:37:10Z	MEMBER	Suppose there were multiple scalar coordinates that are unique for each variable. How would combine_by_coords pick a dimension to stack along? @shoyer it would expand and stack along both, filling the (many) gaps created with `NaN`s. ```python import xarray as xr data_0 = xr.Dataset({'temperature': ('time', [10,20,30])}, coords={'time': [0,1,2]}) data_0.coords['trial'] = 0 # scalar coords data_0.coords['day'] = 1 data_1 = xr.Dataset({'temperature': ('time', [50,60,70])}, coords={'time': [0,1,2]}) data_1.coords['trial'] = 1 data_1.coords['day'] = 0 both scalar coords will be promoted to dims all_trials = xr.combine_by_coords([data_0, data_1]) print(all_trials) <xarray.Dataset> Dimensions: (day: 2, time: 3, trial: 2) Coordinates: * time (time) int64 0 1 2 * trial (trial) int64 0 1 * day (day) int64 0 1 Data variables: temperature (day, trial, time) float64 nan nan nan 50.0 ... nan nan nan `The gaps created will be filled in with NaNs`python print(all_trials['temperature'].data) [[[nan nan nan] [50. 60. 70.]] [[10. 20. 30.] [nan nan nan]]] ``` This gap-filling isn't new though - without this PR the same thing already happens with length-1 dimension coords (since PR #3649 - see my comment there) ```python data_0 = xr.Dataset({'temperature': ('time', [10,20,30])}, coords={'time': [0,1,2]}) data_0.coords['trial'] = [0] # 1D dimension coords data_0.coords['day'] = [1] data_1 = xr.Dataset({'temperature': ('time', [50,60,70])}, coords={'time': [0,1,2]}) data_1.coords['trial'] = [1] data_1.coords['day'] = [0] all_trials = xr.combine_by_coords([data_0, data_1]) print(all_trials) <xarray.Dataset> Dimensions: (day: 2, time: 3, trial: 2) Coordinates: * time (time) int64 0 1 2 * day (day) int64 0 1 * trial (trial) int64 0 1 Data variables: temperature (trial, day, time) float64 nan nan nan 10.0 ... nan nan nan ``` ```python gaps will again be filled in with NaNs print(all_trials['temperature'].data) [[[nan nan nan] [10. 20. 30.]] [[50. 60. 70.] [nan nan nan]]] ``` So all my PR is doing is promoting all scalar coordinates (those which aren't equal across all datasets) to dimension coordinates before combining. There is a chance this could unwittingly increase the overall size of people's datasets (when they have different scalar coordinates in different datasets), but that could already happen since #3649.	{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }		566490806