html_url,issue_url,id,node_id,user,created_at,updated_at,author_association,body,reactions,performed_via_github_app,issue https://github.com/pydata/xarray/issues/2064#issuecomment-512036050,https://api.github.com/repos/pydata/xarray/issues/2064,512036050,MDEyOklzc3VlQ29tbWVudDUxMjAzNjA1MA==,1217238,2019-07-16T23:09:24Z,2019-07-16T23:09:24Z,MEMBER,"> UPDATE: @shoyer it could be that unit tests are failing because, as your final example shows, you get an error for data_vars='minimal' if _any_ variables have different values across datasets, when adding a new concatentation dimension. If this is the reason so many unit tests are failing, then the failures are a red herring and should probably be ignored/rewritten. This seems very likely to me. The existing behavior of `data_vars='minimal'` is only useful in ""existing dimensions mode"". Xarray's unit test suite is definitely a good ""smoke test"" for understanding the impact of changes to `concat` on our users. What it tells us is that we can't change the default value from `""all""` to `""minimal""` without breaking existing code. Instead, we need to change how ""all"" or ""minimal"" works, or switch to yet another mode for the new behavior. The tests we should feel free to rewrite are cases where we set `data_vars=""all""` or `data_vars=""minimal""` explicitly for verifying the weird edge behaviors that I noted in my earlier comments. There shouldn't be too many of these tests.","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,314764258 https://github.com/pydata/xarray/issues/2064#issuecomment-512000102,https://api.github.com/repos/pydata/xarray/issues/2064,512000102,MDEyOklzc3VlQ29tbWVudDUxMjAwMDEwMg==,1217238,2019-07-16T21:44:52Z,2019-07-16T21:44:52Z,MEMBER,"> Specifically, what should the default behavior of concat() be, when both datasets include a variable that does not include the concatenation dimension? Currently, the concat dimension is added, and the result is a ""stacked"" version of the variable. Others have argued that this variable should not be included in the concat() result by default, but this appears to break compatibility with Pandas concat(). Can you give a specific example of the behavior in question?","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,314764258 https://github.com/pydata/xarray/issues/2064#issuecomment-511611430,https://api.github.com/repos/pydata/xarray/issues/2064,511611430,MDEyOklzc3VlQ29tbWVudDUxMTYxMTQzMA==,1217238,2019-07-15T23:54:47Z,2019-07-15T23:54:47Z,MEMBER,"The logic for determining which variables to concatenate is in the `_calc_concat_over` helper function: https://github.com/pydata/xarray/blob/539fb4a98d0961c281daa5474a8e492a0ae1d8a2/xarray/core/concat.py#L146 Only `""different""` is supposed to load variables into memory to determine which ones to concatenate. Right now we also have `""all""` and `""minimal""` options: - `""all""` attempts to concatenate *every* variable that can be broadcast to a matching shape: https://github.com/pydata/xarray/blob/539fb4a98d0961c281daa5474a8e492a0ae1d8a2/xarray/core/concat.py#L188-L190 - `""minimal""` only concatenates variables that already have the matching dimension. Recall that `concat` handles two types of concatenation: existing dimensions (corresponding to `np.concatenate`) and new dimensions (corresponding to `np.stack`). Currently, this is all done together in one messy codebase, but logically it would be cleaner to separate these modes into two separate function: - In ""existing dimensions"" mode: - `""all""` is currently broken, because it will also concatenate variables that don't have the dimension. - `""minimal""` does the right thing, concatenating only variables with the dimension. - In ""new dimensions"" mode: - `""all""` will add the dimension to all variables. - `""minimal""` raise an error if *any* variables have different values. If you're datasets have any data variables with different values at all, it raises an error. This is pretty much useless. Here's my notebook testing this out: https://gist.github.com/shoyer/f44300eddda4f7c476c61f76d1df938b So I'm thinking that we probably want to combine ""all"" and ""minimal"" into a single mode to use as the default, and remove the other behavior, which is either useless or broken. Maybe it would make sense to come up with a new name for this mode, and to make both `""all""` and `""minimal""` deprecated aliases for it? In the long term, this leaves only two ""automatic"" modes for `xarray.concat`, which should make things simpler for users trying to figure this out.","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,314764258 https://github.com/pydata/xarray/issues/2064#issuecomment-381728814,https://api.github.com/repos/pydata/xarray/issues/2064,381728814,MDEyOklzc3VlQ29tbWVudDM4MTcyODgxNA==,1217238,2018-04-16T19:55:24Z,2018-04-16T19:55:24Z,MEMBER,"> I stand corrected. in 0.10.1, I also see the Time variable getting added to refBottomDepth when I open multiple files. So maybe this is not in fact a new problem but an existing issue that happened to behave as I expected only when opening a single file in previous versions. Sorry for not noticing that sooner. OK, in that case I think #2048 was still the right change/bug-fix, making multi-file and single-file behavior consistent. But you certainly have exposed a real issue here. > But this issue raises an important basic point: we might want different behavior for variables in which concat_dim is already a dimension vs. variables for which it is not. Yes, we shouldn't implicitly add a new dimensions to variables in the case where the dimension already exists in the dataset. We only need the heuristics/comparisons when an entirely *new* dimension is being added.","{""total_count"": 1, ""+1"": 1, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,314764258 https://github.com/pydata/xarray/issues/2064#issuecomment-381707540,https://api.github.com/repos/pydata/xarray/issues/2064,381707540,MDEyOklzc3VlQ29tbWVudDM4MTcwNzU0MA==,1217238,2018-04-16T18:42:06Z,2018-04-16T18:42:06Z,MEMBER,"What happens if you open multiple files with `open_mfdataset()`, e.g., for both January and February. Does it result in a dataset with the right dimensions on each variable?","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,314764258