issue_comments: 238114831

This data as json

html_url	issue_url	id	node_id	user	created_at	updated_at	author_association	body	reactions	performed_via_github_app	issue
https://github.com/pydata/xarray/issues/927#issuecomment-238114831	https://api.github.com/repos/pydata/xarray/issues/927	238114831	MDEyOklzc3VlQ29tbWVudDIzODExNDgzMQ==	1217238	2016-08-07T23:14:44Z	2016-08-07T23:14:44Z	MEMBER	First of all -- thanks for diving into this! You are really getting into the messy guts here, so thanks for sticking with it. I can't put any sense in the skip_single_target hack or in the whole special treatment inside align() for when there is only one arg. What's the benefit of the whole thing? What would happen if we simply removed the special logic? I actually just added this hack in a few days ago to fix a regression in the v0.8.0 release (https://github.com/pydata/xarray/issues/943). The problem is that `__setitem__` assignment in xarray does automatic alignment, but arrays with non-unique indexes cannot be aligned. This meant that it was impossible to assign to an array with a non-unique index, even if the new array did not have any labels. I agree that the way I fixed this is really messy. It would be better simply not to align as part of merge if there is only a single labeled argument, but it wasn't obvious how to do last week when I was rushing to get the bug fix out (the logic is currently buried in `deep_align` and needs to be extracted). I'll take a look now and see if I can come up with a cleaner fix, but for now I would stick with a separate internal only `private_align` that allows the extra argument. DataArray.reindex(copy=False) still performs a copy, even if there's nothing to do. I'm a bit afraid to go and fix it right now as I don't want to trigger domino effects One issue issue here is that the purpose and functionality of the `copy` argument is currently poorly documented. It should say something like: If `copy=True`, the return value is independent of the input. If `copy=False` and reindexing is unnecessary, or can be performed with only slice operations, then the data may include references to arrays also found in the input. (I'll update this soon.) We should not be making any promises about returning exactly the same input object, and in fact for consistency we should (as we current do) always return a new Dataset or DataArray, because constructing these objects is pretty cheap. Rather, the issue is whether we need to copy `numpy.ndarray` input, which can be very expensive if the array is large. I'm experiencing a lot of grief because assertDatasetIdentical expects both the coords and the data vars to have the same order, which in some situations it's simply impossible to control without touching vast areas of your code. This shouldn't be true. `assertDatasetIdentical` calls the `Dataset.identical` method, which doesn't care about order: ``` In [29]: from collections import OrderedDict In [30]: ds1 = xr.Dataset(OrderedDict([('x', 0), ('y', 1)])) In [31]: ds2 = xr.Dataset(OrderedDict([('y', 1), ('x', 0)])) In [32]: ds1 Out[32]: <xarray.Dataset> Dimensions: () Coordinates: empty Data variables: x int64 0 y int64 1 In [33]: ds2 Out[33]: <xarray.Dataset> Dimensions: () Coordinates: empty Data variables: y int64 1 x int64 0 In [34]: ds1.identical(ds2) Out[34]: True ``` As a more general and fundamental point, I cannot understand what's the benefit of using OrderedDict instead of a plain dict for coords, attrs, and Dataset.data_vars? This is something we considered before (https://github.com/pydata/xarray/issues/75). I think the main advantage is that OrderedDict is more intuitive (for people) and makes it easier to see at a glance how data changes. One other thing to note: `concat` already handles broadcasting for each variable (independently) with `ensure_common_dims`. This would definitely be handled more cleanly with support for an `exclude` argument to `broadcast_variables`. It wouldn't work to use the main `broadcast` function for this because it would broadcast unnecessary variables when concatenating datasets.	{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }		168470276