issue_comments: 568789678

This data as json

html_url	issue_url	id	node_id	user	created_at	updated_at	author_association	body	reactions	performed_via_github_app	issue
https://github.com/pydata/xarray/pull/3312#issuecomment-568789678	https://api.github.com/repos/pydata/xarray/issues/3312	568789678	MDEyOklzc3VlQ29tbWVudDU2ODc4OTY3OA==	35968931	2019-12-24T18:39:01Z	2019-12-24T18:39:01Z	MEMBER	If we gave all the DataArray objects the same name when converted into Dataset objects, then I think the result could always be a single DataArray? I suppose so, but this seems like an odd way to handle it to me. You're throwing away data (the names) which in other circumstances would be used. This would be consistent with how we do arithmetic with a DataArrays: the names get ignored, and then we assign a name to the result only if there are no conflicting names on the inputs. Do we want consistency with arithmetic, or consistency with `merge`? I strongly feel it should be the latter, as `combine` wraps `merge` (In fact `combine_nested(dataarrays, concat_dim=None) == merge(dataarrays)`). More generally, I think we should try to make the behaviour of all our "combining" functions (i.e. `merge`, `concat`, `update`, `combine_nested`, and `combine_by_coords`) be name-aware. Let me try to clarify by summarizing. Currently, `merge` and `combine_nested` both do the same thing for named DataArrays (return a Dataset) and un-named DataArrays (throw an error). `python da1 = xr.DataArray(name='foo', data=np.random.rand(3,3), coords=[('x', [1, 2, 3]), ('y', [1, 2, 3])]) da2 = xr.DataArray(name='foo2', data=np.random.rand(3,3), coords=[('x', [5, 6, 7]), ('y', [5, 6, 7])]) merge([da1, da2])` and `python da1 = xr.DataArray(name='foo', data=np.random.rand(3,3), coords=[('x', [1, 2, 3]), ('y', [1, 2, 3])]) da2 = xr.DataArray(name='foo2', data=np.random.rand(3,3), coords=[('x', [5, 6, 7]), ('y', [5, 6, 7])]) xr.combine_nested([da1, da2], concat_dim=None)` both return `<xarray.Dataset> Dimensions: (x: 6, y: 6) Coordinates: * x (x) int64 1 2 3 5 6 7 * y (y) int64 1 2 3 5 6 7 Data variables: foo (x, y) float64 0.5235 0.4114 0.7112 nan nan ... nan nan nan nan nan foo2 (x, y) float64 nan nan nan nan nan ... nan 0.08344 0.8844 0.7462` This all makes intuitive sense to me. `combine_by_coords` is basically the same operation as `combine`, just with automated rather than manual ordering. It will also merge different variables together, so it should do the same thing as `merge` and `combine_nested`: fill the gaps in the hypercube up with NaNs (as per #3649) and return a Dataset with two variables, set by the names of the input DataArrays. (and throw an error for un-named DataArrays). However, as shown above, `combine_by_coords` is not consistent with `merge` or `combine_nested`, which this PR will fix. This is all different to the arithmetic logic, but I think it makes way more intuitive sense. It's okay for arithmetic and combining logic to be different, as they are used in different contexts and it's an unambiguous delineation to ignore names in arithmetic, and use them in top-level combining functions. Also, to complete the consistency of the "combining" functions, I think we should make `concat` name-aware, as described in #3315. In short: I propose that "combining" isn't arithmetic, and should be treated separately (and consistently across all types of combine functions).	{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }		494210818