html_url,issue_url,id,node_id,user,created_at,updated_at,author_association,body,reactions,performed_via_github_app,issue
https://github.com/pydata/xarray/issues/927#issuecomment-241232663,https://api.github.com/repos/pydata/xarray/issues/927,241232663,MDEyOklzc3VlQ29tbWVudDI0MTIzMjY2Mw==,1217238,2016-08-21T01:00:27Z,2016-08-21T01:00:27Z,MEMBER,"Fixed by #963
","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,168470276
https://github.com/pydata/xarray/issues/927#issuecomment-238114831,https://api.github.com/repos/pydata/xarray/issues/927,238114831,MDEyOklzc3VlQ29tbWVudDIzODExNDgzMQ==,1217238,2016-08-07T23:14:44Z,2016-08-07T23:14:44Z,MEMBER,"First of all -- thanks for diving into this! You are really getting into the messy guts here, so thanks for sticking with it.
> I can't put any sense in the skip_single_target hack or in the whole special treatment inside align() for when there is only one arg. What's the benefit of the whole thing? What would happen if we simply removed the special logic?
I actually just added this hack in a few days ago to fix a regression in the v0.8.0 release (https://github.com/pydata/xarray/issues/943). The problem is that `__setitem__` assignment in xarray does automatic alignment, but arrays with non-unique indexes cannot be aligned. This meant that it was impossible to assign to an array with a non-unique index, even if the new array did not have any labels.
I agree that the way I fixed this is really messy. It would be better simply not to align as part of merge if there is only a single labeled argument, but it wasn't obvious how to do last week when I was rushing to get the bug fix out (the logic is currently buried in `deep_align` and needs to be extracted).
I'll take a look now and see if I can come up with a cleaner fix, but for now I would stick with a separate internal only `private_align` that allows the extra argument.
> DataArray.reindex(copy=False) still performs a copy, even if there's nothing to do. I'm a bit afraid to go and fix it right now as I don't want to trigger domino effects
One issue issue here is that the purpose and functionality of the `copy` argument is currently poorly documented. It should say something like:
```
If `copy=True`, the return value is independent of the input. If `copy=False` and
reindexing is unnecessary, or can be performed with only slice operations,
then the data may include references to arrays also found in the input.
```
(I'll update this soon.)
We should not be making any promises about returning exactly the same input object, and in fact for consistency we should (as we current do) always return a new Dataset or DataArray, because constructing these objects is pretty cheap. Rather, the issue is whether we need to copy `numpy.ndarray` input, which can be very expensive if the array is large.
> I'm experiencing a lot of grief because assertDatasetIdentical expects both the coords and the data vars to have the same order, which in some situations it's simply impossible to control without touching vast areas of your code.
This shouldn't be true. `assertDatasetIdentical` calls the `Dataset.identical` method, which doesn't care about order:
```
In [29]: from collections import OrderedDict
In [30]: ds1 = xr.Dataset(OrderedDict([('x', 0), ('y', 1)]))
In [31]: ds2 = xr.Dataset(OrderedDict([('y', 1), ('x', 0)]))
In [32]: ds1
Out[32]:
Dimensions: ()
Coordinates:
*empty*
Data variables:
x int64 0
y int64 1
In [33]: ds2
Out[33]:
Dimensions: ()
Coordinates:
*empty*
Data variables:
y int64 1
x int64 0
In [34]: ds1.identical(ds2)
Out[34]: True
```
> As a more general and fundamental point, I cannot understand what's the benefit of using OrderedDict instead of a plain dict for coords, attrs, and Dataset.data_vars?
This is something we considered before (https://github.com/pydata/xarray/issues/75). I think the main advantage is that OrderedDict is more intuitive (for people) and makes it easier to see at a glance how data changes.
---
One other thing to note: `concat` already handles broadcasting for each variable (independently) with [`ensure_common_dims`](https://github.com/pydata/xarray/blob/7d7673c14bb7d6468c8671e1229b3b4bdfb82c4a/xarray/core/combine.py#L253). This would definitely be handled more cleanly with support for an `exclude` argument to [`broadcast_variables`](https://github.com/pydata/xarray/blob/7d7673c14bb7d6468c8671e1229b3b4bdfb82c4a/xarray/core/variable.py#L1203). It wouldn't work to use the main `broadcast` function for this because it would broadcast unnecessary variables when concatenating datasets.
","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,168470276
https://github.com/pydata/xarray/issues/927#issuecomment-236919314,https://api.github.com/repos/pydata/xarray/issues/927,236919314,MDEyOklzc3VlQ29tbWVudDIzNjkxOTMxNA==,1217238,2016-08-02T14:21:06Z,2016-08-02T14:21:06Z,MEMBER,"Awesome! We usually stick to outer joins by default so we don't
inadvertantly drop data. If datasets are already aligned, calling align
with copy=False should have minimal performance costs.
On Tue, Aug 2, 2016 at 12:20 AM crusaderky notifications@github.com wrote:
> I can work on it.
> Should we go for a default outer join + broadcast inside concat? If the
> input already arrive aligned, or if the user wants a different type of
> join, this will slow things down with useless code though. What's your
> policy on this?
>
> —
> You are receiving this because you commented.
> Reply to this email directly, view it on GitHub
> https://github.com/pydata/xarray/issues/927#issuecomment-236822408, or mute
> the thread
> https://github.com/notifications/unsubscribe-auth/ABKS1uZ5XlGSWrDfp3MRrKxPFd2zlD_aks5qbu-ugaJpZM4JY0b1
> .
","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,168470276
https://github.com/pydata/xarray/issues/927#issuecomment-236767844,https://api.github.com/repos/pydata/xarray/issues/927,236767844,MDEyOklzc3VlQ29tbWVudDIzNjc2Nzg0NA==,1217238,2016-08-02T01:43:45Z,2016-08-02T01:43:45Z,MEMBER,"Indeed, we could probably replace `align` with `partial_align`, and use this inside `concat` (see #930 for an example of that). This is probably worth doing.
I didn't add `exclude` to `align` before mostly because I wasn't sure if the functionality would be useful to users, and I wanted to avoid making the mistake of expanding the API prematurely (it's harder to remove features than add them). Also, I didn't write tests or a good docstring for `partial_align` :).
","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,168470276