html_url,issue_url,id,node_id,user,created_at,updated_at,author_association,body,reactions,performed_via_github_app,issue https://github.com/pydata/xarray/issues/2180#issuecomment-1113990595,https://api.github.com/repos/pydata/xarray/issues/2180,1113990595,IC_kwDOAMm_X85CZiXD,26384082,2022-04-30T13:37:47Z,2022-04-30T13:37:47Z,NONE,"In order to maintain a list of currently relevant issues, we mark issues as stale after a period of inactivity If this issue remains relevant, please comment here or remove the `stale` label; otherwise it will be marked as closed automatically ","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,326205036 https://github.com/pydata/xarray/issues/2180#issuecomment-619325660,https://api.github.com/repos/pydata/xarray/issues/2180,619325660,MDEyOklzc3VlQ29tbWVudDYxOTMyNTY2MA==,26384082,2020-04-25T05:39:47Z,2020-04-25T05:39:47Z,NONE,"In order to maintain a list of currently relevant issues, we mark issues as stale after a period of inactivity If this issue remains relevant, please comment here or remove the `stale` label; otherwise it will be marked as closed automatically ","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,326205036 https://github.com/pydata/xarray/issues/2180#issuecomment-391932929,https://api.github.com/repos/pydata/xarray/issues/2180,391932929,MDEyOklzc3VlQ29tbWVudDM5MTkzMjkyOQ==,1217238,2018-05-25T03:46:40Z,2018-05-25T03:46:40Z,MEMBER,"Looking at @crusaderky's example of different coordinate labels again, I finally remember why it works this way. The logic of `ds.update(other)` is that (1) variables explicitly listed in `other` should take precedence over the original object and (2) mutating a Dataset should not change its dimensions or indexes. This is pretty clearly expressed in the original code: ``` return merge_core([dataset, other], priority_arg=1, indexes=dataset.indexes) ``` In @crusaderky's example with `fridge.update(shopping)`, `shopping` first gets reindexed to `fridge` (which means it ends up only holding NaN), and is then used to override the original dataset: ``` Dimensions: (fruit: 1) Coordinates: * fruit (fruit) object 'apples' quality (fruit) object nan Data variables: fruits (fruit) float64 nan ``` It would probably make sense to keep values from the original variables rather than blindly replacing them with the new NaNs from `shopping`, but in general I do think the general approach of ""right join on variables"" and ""left join on indexes"" makes sense for `update()`. For most use cases, the true outer join makes more sense -- which is why `xarray.merge()` works that way.","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,326205036 https://github.com/pydata/xarray/issues/2180#issuecomment-391919607,https://api.github.com/repos/pydata/xarray/issues/2180,391919607,MDEyOklzc3VlQ29tbWVudDM5MTkxOTYwNw==,6815844,2018-05-25T02:06:47Z,2018-05-25T02:19:45Z,MEMBER,"For referene the original issue in #2068 was ```python In [4]: ds = xr.Dataset() ...: ds.coords['source'] = (['a', 'b', 'c'], np.random.random((2, 3, 4))) ...: ds.coords['unrelated'] = (['a', 'c'], np.random.random((2, 4))) ...: ds ...: Out[4]: Dimensions: (a: 2, b: 3, c: 4) Coordinates: source (a, b, c) float64 0.4158 0.07152 0.4258 0.4382 0.6616 0.142 ... unrelated (a, c) float64 0.9318 0.03723 0.4226 0.9472 0.8753 0.7022 ... Dimensions without coordinates: a, b, c Data variables: *empty* In [5]: ds['dest-2'] = xr.ones_like(ds['source'].isel(c=0)) ...: ds ...: Out[5]: Dimensions: (a: 2, b: 3) Coordinates: source (a, b) float64 0.4158 0.6616 0.1583 0.7821 0.221 0.2555 unrelated (a) float64 0.9318 0.8753 Dimensions without coordinates: a, b Data variables: dest-2 (a, b) float64 1.0 1.0 1.0 1.0 1.0 1.0 ``` where `ds['unrelated']` drops dimension `c`. We changed this behavior in #2087, but I think it was a wrong direction. The previous behavior might be OK as long as `unrelated` is a coordinate variable. EDIT: I still feel something strange both in the previous and current behavior of `__setitem__` with coord. Generally, as @crusaderky has pointed out, the right join will be a better choice. But In the above example, dropping the dimension of `c` of 'unrelated' looks also awkward.","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,326205036 https://github.com/pydata/xarray/issues/2180#issuecomment-391915849,https://api.github.com/repos/pydata/xarray/issues/2180,391915849,MDEyOklzc3VlQ29tbWVudDM5MTkxNTg0OQ==,6815844,2018-05-25T01:41:27Z,2018-05-25T01:41:27Z,MEMBER,"Thanks, @crusaderky. The first behavior you pointed out is a bug I think. I raised an issue in #2184, and maybe it should be discussed there. For the second example, > I think this should be a right join. I agree with this.","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,326205036 https://github.com/pydata/xarray/issues/2180#issuecomment-391914654,https://api.github.com/repos/pydata/xarray/issues/2180,391914654,MDEyOklzc3VlQ29tbWVudDM5MTkxNDY1NA==,6213168,2018-05-25T01:32:51Z,2018-05-25T01:33:25Z,MEMBER,"> If there are conflicts in dimension coordinate, should it be outer join? Consider this example: ``` a = Dataset({ 'x': [10, 20], 'd1': ('x', [100, 200]), 'd2': ('x', [300, 400]) }) b = Dataset({ 'x': [15], 'd1': ('x', [500]), }) a.update(b) ``` In the above, with anything but an outer join you're destroying d2 - which doesn't even exist in the rhs dataset! A sane, desirable outcome should be ``` Dataset({ 'x': [10, 20, 15], 'd1': ('x', [nan, nan, 500]), 'd2': ('x', [300, 400, nan]) }) ``` > If there are no conflicts in dimension coordinate, but there are conflicts in non dimension coordinate, whether left or right should be prioritized? I think this should be a right join. I alway think of non-index coords as N-to-1 properties of the index. For example, ``` a = Dataset( coords={ 'country': ('country', ['UK', 'France', 'Greece']), 'currency': ('country', ['GBP', 'EUR', 'EUR']), }, data_vars={ 'GDP': ('country', [1000, 2000, 3000]), 'Debt': ('country', [100, 200, 300]), }) b = Dataset( # Greece exits the Eurozone coords={ 'country': ('country', ['UK', 'France', 'Greece']), 'currency': ('country', ['GBP', 'EUR', 'GRD']), }, data_vars={ 'GDP': ('country', [1000, 2000, 150000]), }) a.update(b) ``` In the above example, I just broke the Debt variable - as I forgot to perform a currency conversion for the greek debt, which has been silently changed from 300 EUR to 300 GRD. However I can't see any elegant way to avoid this. I *definitely* would not like to duplicate the 'country' index.","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,326205036 https://github.com/pydata/xarray/issues/2180#issuecomment-391910682,https://api.github.com/repos/pydata/xarray/issues/2180,391910682,MDEyOklzc3VlQ29tbWVudDM5MTkxMDY4Mg==,6815844,2018-05-25T01:05:13Z,2018-05-25T01:30:47Z,MEMBER,"> So maybe we can leave the current behavior as is for now (but remove the warning). Agreed. ~@shoyer, how do you think about the current `__setitem__` behavior with conflict `dimension coordinate`? Should it be outer join as @crusaderky pointed out?~ EDIT: I did not noticed the above comment. I will raise an issue for this.","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,326205036 https://github.com/pydata/xarray/issues/2180#issuecomment-391908821,https://api.github.com/repos/pydata/xarray/issues/2180,391908821,MDEyOklzc3VlQ29tbWVudDM5MTkwODgyMQ==,1217238,2018-05-25T00:50:58Z,2018-05-25T00:51:10Z,MEMBER,@crusaderky this behavior you show is indeed really strange. I don't know why alignment of dimensions works that way currently.,"{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,326205036 https://github.com/pydata/xarray/issues/2180#issuecomment-391908588,https://api.github.com/repos/pydata/xarray/issues/2180,391908588,MDEyOklzc3VlQ29tbWVudDM5MTkwODU4OA==,1217238,2018-05-25T00:49:15Z,2018-05-25T00:49:15Z,MEMBER,"OK, looking at this more carefully `ds.update(other)` didn't actually change when other is a `Dataset`, because `ds[k] = ds[k].drop(coord_names)` doesn't actually drop coordinates from a Dataset. It just shows a warning now, due to iteration over a Dataset. So maybe we can leave the current behavior as is for now (but remove the warning). What did change is how we handle conflicts in `__setitem__` (which was intentional), and how we handle conflicts in `update` when the new value is a dictionary (which was *not* intentional, but at least remained consistent with `__setitem__`).","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,326205036 https://github.com/pydata/xarray/issues/2180#issuecomment-391908483,https://api.github.com/repos/pydata/xarray/issues/2180,391908483,MDEyOklzc3VlQ29tbWVudDM5MTkwODQ4Mw==,6815844,2018-05-25T00:48:27Z,2018-05-25T00:48:27Z,MEMBER,"#2087 changed the second behavior. ```python In [1]: import xarray ...: ...: fridge = xarray.Dataset( ...: data_vars={ ...: 'var1': ('fruit', [10]), ...: }, ...: coords={ ...: 'fruit': ('fruit', [1]), ...: 'quality': ('fruit', ['Red Velvet']), ...: }) ...: shopping = xarray.Dataset( ...: data_vars={ ...: 'var1': ('fruit', [20]), ...: }, ...: coords={ ...: 'fruit': ('fruit', [1]), ...: 'quality': ('fruit', ['Tangerine']), ...: }) ...: ...: fridge['var1'] = shopping['var1'] ...: ``` with v10.3 ```python In [2]: fridge Out[2]: Dimensions: (fruit: 1) Coordinates: * fruit (fruit) int64 1 quality (fruit) Dimensions: (fruit: 1) Coordinates: * fruit (fruit) int64 1 quality (fruit) Dimensions: (fruit: 1) Coordinates: * fruit (fruit) object 'apples' quality (fruit) object nan Data variables: fruits (fruit) float64 nan ``` The above doesn't make any sense to me. I wanted to replace the fruits variable with brand new content, and instead I lost both the old and the new?!?","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,326205036 https://github.com/pydata/xarray/issues/2180#issuecomment-391902317,https://api.github.com/repos/pydata/xarray/issues/2180,391902317,MDEyOklzc3VlQ29tbWVudDM5MTkwMjMxNw==,6815844,2018-05-25T00:04:58Z,2018-05-25T00:04:58Z,MEMBER,"I think we should discuss *dimension coordinte* and *non-dimenson coordinate* separately. I guess @shoyer meant *non-dimenson coordinate* here. For *dimension coordinte*, it is always outer join, if I understand correctly.","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,326205036 https://github.com/pydata/xarray/issues/2180#issuecomment-391900937,https://api.github.com/repos/pydata/xarray/issues/2180,391900937,MDEyOklzc3VlQ29tbWVudDM5MTkwMDkzNw==,6213168,2018-05-24T23:56:55Z,2018-05-24T23:56:55Z,MEMBER,"I'm of the strong opinion that _all_ joins should be outer joins unless the user explicitly says otherwise, as it's the approach least prone to do damage. I would humbly suggest considering the change for a future major release (0.11 / 0.12), with several minor releases before that printing futurewarnings. This said, I think that changing from a right join (0.10.3) to a left join (0.10.4) will only cause breakages without providing any actual benefit in terms of user-friendliness, so we should retain the previous behaviour. A right join _vaguely_ makes more sense IMHO as it follows the general phylosophy of ``dict.update()`` where rhs wins in case of collision.","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,326205036 https://github.com/pydata/xarray/issues/2180#issuecomment-391899432,https://api.github.com/repos/pydata/xarray/issues/2180,391899432,MDEyOklzc3VlQ29tbWVudDM5MTg5OTQzMg==,6815844,2018-05-24T23:47:52Z,2018-05-24T23:49:15Z,MEMBER,"I think `dataset.update(other)` should be equivalent with ```python for key, value in other.items(): dataset[key] = value ``` as similar to python native `dict`. Our `.item()` ony iterates over data_vars not coordinate. So I think even in `dataset.update(other)` coordinates from other should be dropped if there is a conflict. ","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,326205036 https://github.com/pydata/xarray/issues/2180#issuecomment-391898293,https://api.github.com/repos/pydata/xarray/issues/2180,391898293,MDEyOklzc3VlQ29tbWVudDM5MTg5ODI5Mw==,1217238,2018-05-24T23:40:34Z,2018-05-24T23:40:41Z,MEMBER,cc @fujiisoup @crusaderky ,"{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,326205036