issue_comments: 579203497

This data as json

html_url	issue_url	id	node_id	user	created_at	updated_at	author_association	body	reactions	performed_via_github_app	issue
https://github.com/pydata/xarray/pull/3545#issuecomment-579203497	https://api.github.com/repos/pydata/xarray/issues/3545	579203497	MDEyOklzc3VlQ29tbWVudDU3OTIwMzQ5Nw==	5821660	2020-01-28T11:33:41Z	2020-01-28T11:52:02Z	MEMBER	@scottcha @shoyer below is a minimal example where one variable is missing in each file. ```python import random random.seed(123) random.randint(0, 10) create var names list with one missing value orig = [f'd{i:02}' for i in range(10)] datasets = [] for i in range(1, 9): l1 = orig.copy() l1.remove(f'd{i:02}') datasets.append(l1) create files for i, dsl in enumerate(datasets): foo_data = np.arange(24).reshape(2, 3, 4) with nc.Dataset(f'test{i:02}.nc', 'w') as ds: ds.createDimension('x', size=2) ds.createDimension('y', size=3) ds.createDimension('z', size=4) for k in dsl: ds.createVariable(k, int, ('x', 'y', 'z')) ds.variables[k][:] = foo_data flist = glob.glob('test*.nc') dslist = [] for f in flist: dslist.append(xr.open_dataset(f)) ds2 = xr.concat(dslist, dim='time') ds2 ``` Output: <xarray.Dataset> Dimensions: (time: 8, x: 2, y: 3, z: 4) Dimensions without coordinates: time, x, y, z Data variables: d01 (x, y, z) int64 0 1 2 3 4 5 6 7 8 9 ... 15 16 17 18 19 20 21 22 23 d00 (time, x, y, z) int64 0 1 2 3 4 5 6 7 8 ... 16 17 18 19 20 21 22 23 d02 (time, x, y, z) float64 0.0 1.0 2.0 3.0 4.0 ... 20.0 21.0 22.0 23.0 d03 (time, x, y, z) float64 0.0 1.0 2.0 3.0 4.0 ... 20.0 21.0 22.0 23.0 d04 (time, x, y, z) float64 0.0 1.0 2.0 3.0 4.0 ... 20.0 21.0 22.0 23.0 d05 (time, x, y, z) float64 0.0 1.0 2.0 3.0 4.0 ... 20.0 21.0 22.0 23.0 d06 (time, x, y, z) float64 0.0 1.0 2.0 3.0 4.0 ... 20.0 21.0 22.0 23.0 d07 (time, x, y, z) float64 0.0 1.0 2.0 3.0 4.0 ... 20.0 21.0 22.0 23.0 d08 (time, x, y, z) float64 0.0 1.0 2.0 3.0 4.0 ... nan nan nan nan nan d09 (time, x, y, z) int64 0 1 2 3 4 5 6 7 8 ... 16 17 18 19 20 21 22 23 Three cases here: `d00` and `d09` are available in all datasets, and they are concatenated correctly (keeping dtype) `d02` to `d08` are missing in one dataset and are filled with the created dummy variable, but the dtype is converted to float64 `d01` is not handled properly, because it is missing in the first dataset, this is due to checking only variables of first dataset in `_calc_concat_over` `python elif opt == "all": concat_over.update( set(getattr(datasets[0], subset)) - set(datasets[0].dims) )` and from putting `d01` in `result_vars` before iterating to find missing variables.	{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }		524043729