github: issue_comments: 31 rows where issue = 314764258 sorted by updated

31 rows where issue = 314764258 sorted by updated_at descending

Search:

descending

id	html_url	issue_url	node_id	user	created_at	updated_at ▲	author_association	body	reactions	issue
531818131	https://github.com/pydata/xarray/issues/2064#issuecomment-531818131	https://api.github.com/repos/pydata/xarray/issues/2064	MDEyOklzc3VlQ29tbWVudDUzMTgxODEzMQ==	dcherian 2448579	2019-09-16T15:03:12Z	2019-09-16T15:03:12Z	MEMBER	#3239 has been merged. Now `minimal` is more useful since you can specify `compat="override"` to skip compatibility checking. What's left is to change defaults to implement @shoyer's comment So I'm thinking that we probably want to combine "all" and "minimal" into a single mode to use as the default, and remove the other behavior, which is either useless or broken. Maybe it would make sense to come up with a new name for this mode, and to make both "all" and "minimal" deprecated aliases for it? In the long term, this leaves only two "automatic" modes for xarray.concat, which should make things simpler for users trying to figure this out.	{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	concat_dim getting added to all variables of multifile datasets 314764258
524021001	https://github.com/pydata/xarray/issues/2064#issuecomment-524021001	https://api.github.com/repos/pydata/xarray/issues/2064	MDEyOklzc3VlQ29tbWVudDUyNDAyMTAwMQ==	dcherian 2448579	2019-08-22T18:22:37Z	2019-08-22T18:22:37Z	MEMBER	Thanks for your input @bonnland. The pandas concat() function uses the option join = {'inner', 'outer', 'left', 'right'} in order to mimic logical database join operations. If there is a reason that xarray cannot do the same, it is not obvious to me. I think the pandas options have the advantage of logical simplicity and traditional usage within database systems. We do have a `join` argument that takes these arguments + 'override' which was added recently to skip expensive comparisons. This works for "indexes" or "dimension coordinates". An example: if you have 2 dataarrays, one on a coordinate `x=[1, 2, 3]` and the other on `x=[2,3,4]`, `join` lets you control the `x` coordinate of the output. This is done by `xr.align`. What's under discussion here is what to do about variables duplicated across datasets or indeed, how do we know that these variables are duplicated across datasets when concatenating other variables.	{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	concat_dim getting added to all variables of multifile datasets 314764258
523968827	https://github.com/pydata/xarray/issues/2064#issuecomment-523968827	https://api.github.com/repos/pydata/xarray/issues/2064	MDEyOklzc3VlQ29tbWVudDUyMzk2ODgyNw==	bonnland 10638475	2019-08-22T16:01:40Z	2019-08-22T16:08:58Z	NONE	I have tried to understand why the xarray developers decided to provide their own options for concatenation. I am not an experienced user of xarray, but I can't find any discussion of how the current options for concatenation were derived. The pandas concat() function uses the option join = {'inner', 'outer', 'left', 'right'} in order to mimic logical database join operations. If there is a reason that xarray cannot do the same, it is not obvious to me. I think the pandas options have the advantage of logical simplicity and traditional usage within database systems. Perhaps the reason is that xarray is modeling collections of variables, rather than a single dataframe, as with pandas. But even then, it seems like the pandas rules can be applied on a per-variable basis.	{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	concat_dim getting added to all variables of multifile datasets 314764258
523960862	https://github.com/pydata/xarray/issues/2064#issuecomment-523960862	https://api.github.com/repos/pydata/xarray/issues/2064	MDEyOklzc3VlQ29tbWVudDUyMzk2MDg2Mg==	dcherian 2448579	2019-08-22T15:42:10Z	2019-08-22T15:42:10Z	MEMBER	I have a draft solution in #3239. It adds a new mode called "sensible" that acts like "all" when the concat dimension doesn't exist in the dataset and acts like "minimal" when the dimension is present. We can decide whether this is the right way i.e. add a new mode but the more fundamental problem is below. The issue is dealing with variables that should not be concatentated in "minimal" mode (e.g. time-invariant non dim coords when concatenating in time). In this case, we want to skip the equality checks in `_calc_concat_over`. This is a common reason for poor `open_mfdataset` performance. I thought the clean way to do this would be to add the `compat` kwarg to `concat` and then add `compat='override'` since the current behaviour is effectively `compat='equals'`. However, `merge` takes `compat` too and `concat` and `merge` support different `compat` arguments at present. This makes it complicated to easily thread `compat` down from `combine` or `open_mfdataset` without adding `concat_compat` and `merge_compat` which is silly. So do we want to support all the other `compat` modes in `concat`? Things like `broadcast_equals` or `no_conflicts` are funny because they're basically `merge` operations and it means `concat` acts like both `stack`, `concat` and `merge`. OTOH if you have a set of variables with the same name from different datasets and you want to pick one of those (i.e. no concatenation), then you're basically doing `merge` anyway. This would require some refactoring since `concat` assumes the first dataset is a template for the rest. @shoyer What do you think?	{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	concat_dim getting added to all variables of multifile datasets 314764258
519149757	https://github.com/pydata/xarray/issues/2064#issuecomment-519149757	https://api.github.com/repos/pydata/xarray/issues/2064	MDEyOklzc3VlQ29tbWVudDUxOTE0OTc1Nw==	dcherian 2448579	2019-08-07T15:32:16Z	2019-08-07T15:32:16Z	MEMBER	Maybe it would make sense to come up with a new name for this mode, and to make both "all" and "minimal" deprecated aliases for it? I'm in favour of this. What should we name this mode? One comment on "existing dimensions" mode: "minimal" does the right thing, concatenating only variables with the dimension. For variables without the dimension, this will still raise a `ValueError` because `compat` can only be `'equals'` or `'identical'`. It seems to me like we need `compat='override'` and/or `compat='tolerance', tolerance=...` that would use numpy's approximate equality testing. This checking of non-dimensional coordinates is a common source of `mfdataset` issues. What do you think?	{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	concat_dim getting added to all variables of multifile datasets 314764258
512036050	https://github.com/pydata/xarray/issues/2064#issuecomment-512036050	https://api.github.com/repos/pydata/xarray/issues/2064	MDEyOklzc3VlQ29tbWVudDUxMjAzNjA1MA==	shoyer 1217238	2019-07-16T23:09:24Z	2019-07-16T23:09:24Z	MEMBER	UPDATE: @shoyer it could be that unit tests are failing because, as your final example shows, you get an error for data_vars='minimal' if any variables have different values across datasets, when adding a new concatentation dimension. If this is the reason so many unit tests are failing, then the failures are a red herring and should probably be ignored/rewritten. This seems very likely to me. The existing behavior of `data_vars='minimal'` is only useful in "existing dimensions mode". Xarray's unit test suite is definitely a good "smoke test" for understanding the impact of changes to `concat` on our users. What it tells us is that we can't change the default value from `"all"` to `"minimal"` without breaking existing code. Instead, we need to change how "all" or "minimal" works, or switch to yet another mode for the new behavior. The tests we should feel free to rewrite are cases where we set `data_vars="all"` or `data_vars="minimal"` explicitly for verifying the weird edge behaviors that I noted in my earlier comments. There shouldn't be too many of these tests.	{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	concat_dim getting added to all variables of multifile datasets 314764258
512005032	https://github.com/pydata/xarray/issues/2064#issuecomment-512005032	https://api.github.com/repos/pydata/xarray/issues/2064	MDEyOklzc3VlQ29tbWVudDUxMjAwNTAzMg==	bonnland 10638475	2019-07-16T22:01:59Z	2019-07-16T22:50:39Z	NONE	Can you give a specific example of the behavior in question? Here is the most specific thing I can say: If I switch the default value for data_vars to 'minimal' for concat() and open_mfdataset(), then I get a lot of failing unit tests (when running "pytest xarray -n 4". I may be wrong about why they are failing. The unit tests have comments in them, like "Check pandas compatibility"; see for example, line 370 in test_duck_array_ops.py for an example instruction that raises a ValueError exception. Many failures appear to be caused by a ValueError exception being raised, like in the final example you have in your notebook. I hope this is specific enough; I realize that I'm not deeply comprehending what the unit tests are actually supposed to be testing. UPDATE: @shoyer it could be that unit tests are failing because, as your final example shows, you get an error for data_vars='minimal' if any variables have different values across datasets, when adding a new concatentation dimension. If this is the reason so many unit tests are failing, then the failures are a red herring and should probably be ignored/rewritten.	{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	concat_dim getting added to all variables of multifile datasets 314764258
512000102	https://github.com/pydata/xarray/issues/2064#issuecomment-512000102	https://api.github.com/repos/pydata/xarray/issues/2064	MDEyOklzc3VlQ29tbWVudDUxMjAwMDEwMg==	shoyer 1217238	2019-07-16T21:44:52Z	2019-07-16T21:44:52Z	MEMBER	Specifically, what should the default behavior of concat() be, when both datasets include a variable that does not include the concatenation dimension? Currently, the concat dimension is added, and the result is a "stacked" version of the variable. Others have argued that this variable should not be included in the concat() result by default, but this appears to break compatibility with Pandas concat(). Can you give a specific example of the behavior in question?	{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	concat_dim getting added to all variables of multifile datasets 314764258
511987332	https://github.com/pydata/xarray/issues/2064#issuecomment-511987332	https://api.github.com/repos/pydata/xarray/issues/2064	MDEyOklzc3VlQ29tbWVudDUxMTk4NzMzMg==	bonnland 10638475	2019-07-16T21:06:58Z	2019-07-16T21:06:58Z	NONE	@shoyer I'm sorry I didn't look at your examples more closely at first. I see now that your first example of using data_vars='minimal' is already preserving one instance of the variable x, and I was suggesting earlier that this variable was not being included in the concatenation. So I am not clear on why so many unit tests fail when I switch the default value for data_vars to 'minimal'. The output from your examples seems compatible with Pandas concat, though I don't understand Pandas very well yet. I wonder if the unit tests that fail are written correctly. I have to add that I spent an entire day trying to understand the code in concat.py, by stepping through it for several unit tests. I found the code quite difficult to understand.	{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	concat_dim getting added to all variables of multifile datasets 314764258
511903346	https://github.com/pydata/xarray/issues/2064#issuecomment-511903346	https://api.github.com/repos/pydata/xarray/issues/2064	MDEyOklzc3VlQ29tbWVudDUxMTkwMzM0Ng==	bonnland 10638475	2019-07-16T17:06:46Z	2019-07-16T17:47:45Z	NONE	@shoyer Your explanation makes sense, but there are unit tests that expect the default concat() behavior to be the same as default behavior for Pandas concat(), which tries to perform an "outer" join between dataframes. Therefore, from my limited understanding, the default behavior for xarray concat() should be to preserve all variables. If this default behavior changes, then it may break code making these expectations. Can we get a perspective from the author of concat.py, @TomNicholas ? Thanks. Specifically, what should the default behavior of concat() be, when both datasets include a variable that does not include the concatenation dimension? Currently, the concat dimension is added, and the result is a "stacked" version of the variable. Others have argued that this variable should not be included in the concat() result by default, but this appears to break compatibility with Pandas concat(). Another possibility could be to include the first instance of the variable in the result set, throwing away any other instances of the same variable, so a "stacking" dimension is not needed. This would potentially lose information if the variable instances are not identical, however.	{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	concat_dim getting added to all variables of multifile datasets 314764258
511611430	https://github.com/pydata/xarray/issues/2064#issuecomment-511611430	https://api.github.com/repos/pydata/xarray/issues/2064	MDEyOklzc3VlQ29tbWVudDUxMTYxMTQzMA==	shoyer 1217238	2019-07-15T23:54:47Z	2019-07-15T23:54:47Z	MEMBER	The logic for determining which variables to concatenate is in the `_calc_concat_over` helper function: https://github.com/pydata/xarray/blob/539fb4a98d0961c281daa5474a8e492a0ae1d8a2/xarray/core/concat.py#L146 Only `"different"` is supposed to load variables into memory to determine which ones to concatenate. Right now we also have `"all"` and `"minimal"` options: - `"all"` attempts to concatenate every variable that can be broadcast to a matching shape: https://github.com/pydata/xarray/blob/539fb4a98d0961c281daa5474a8e492a0ae1d8a2/xarray/core/concat.py#L188-L190 - `"minimal"` only concatenates variables that already have the matching dimension. Recall that `concat` handles two types of concatenation: existing dimensions (corresponding to `np.concatenate`) and new dimensions (corresponding to `np.stack`). Currently, this is all done together in one messy codebase, but logically it would be cleaner to separate these modes into two separate function: - In "existing dimensions" mode: - `"all"` is currently broken, because it will also concatenate variables that don't have the dimension. - `"minimal"` does the right thing, concatenating only variables with the dimension. - In "new dimensions" mode: - `"all"` will add the dimension to all variables. - `"minimal"` raise an error if any variables have different values. If you're datasets have any data variables with different values at all, it raises an error. This is pretty much useless. Here's my notebook testing this out: https://gist.github.com/shoyer/f44300eddda4f7c476c61f76d1df938b So I'm thinking that we probably want to combine "all" and "minimal" into a single mode to use as the default, and remove the other behavior, which is either useless or broken. Maybe it would make sense to come up with a new name for this mode, and to make both `"all"` and `"minimal"` deprecated aliases for it? In the long term, this leaves only two "automatic" modes for `xarray.concat`, which should make things simpler for users trying to figure this out.	{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	concat_dim getting added to all variables of multifile datasets 314764258
511583067	https://github.com/pydata/xarray/issues/2064#issuecomment-511583067	https://api.github.com/repos/pydata/xarray/issues/2064	MDEyOklzc3VlQ29tbWVudDUxMTU4MzA2Nw==	bonnland 10638475	2019-07-15T21:48:15Z	2019-07-15T21:50:50Z	NONE	@dcherian . I believe you are correct in principle, but there is a logical problem that is expensive to evaluate. The difficult case is when two datasets have a variable with the same name, and that variable does not include the concatenation dimension. In order to align the datasets for concatenation, both variables would need to be identical. The resulting dataset would just have one (unchanged) instance of that variable, say from the first dataset. I think someone along the way decided this operation was too expensive. This is from concat.py, lines 302-307: `# stack up each variable to fill-out the dataset (in order) for k in datasets[0].variables: if k in concat_over: vars = ensure_common_dims([ds.variables[k] for ds in datasets]) combined = concat_vars(vars, dim, positions) insert_result_variable(k, combined)` So I think some consensus needs to be reached, about whether it is a good idea to load these variables into memory to check for identical-ness between them. Or another possibility is that we leave "unique" variables alone: if a variable exists only once across all datasets being concatenated, we do not add the concatenation dimension to it. This might solve @xylar original poster's issue when opening a single dataset.	{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	concat_dim getting added to all variables of multifile datasets 314764258
511468454	https://github.com/pydata/xarray/issues/2064#issuecomment-511468454	https://api.github.com/repos/pydata/xarray/issues/2064	MDEyOklzc3VlQ29tbWVudDUxMTQ2ODQ1NA==	dcherian 2448579	2019-07-15T16:15:51Z	2019-07-15T16:15:51Z	MEMBER	@bonnland I don't think you want to change the default `data_vars` but instead update the heuristics as in this comment we shouldn't implicitly add a new dimensions to variables in the case where the dimension already exists in the dataset. We only need the heuristics/comparisons when an entirely new dimension is being added.	{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	concat_dim getting added to all variables of multifile datasets 314764258
511210149	https://github.com/pydata/xarray/issues/2064#issuecomment-511210149	https://api.github.com/repos/pydata/xarray/issues/2064	MDEyOklzc3VlQ29tbWVudDUxMTIxMDE0OQ==	bonnland 10638475	2019-07-14T15:03:29Z	2019-07-14T21:36:03Z	NONE	So there are some units tests that assert the behavior for open_mfdataset() is identical to the behavior for concat(). This implies that if we change the default data_vars value from "all" to "minimal" for one function, we need to change it for both functions. @shoyer I think you suggested that concat() default behavior should change in #2145, in the same way it will change for open_mfdataset. So I am going to add this change to the pull request. UPDATE: There is a problem with changing concat() away from having data_ vars='all'. This breaks many unit tests that check for compatibility with Pandas. What I've been told is that Pandas concat() will include all unique variables from each dataframe. This is what data_vars='all' will also do. By changing to data_vars='minimal', only data variables with the specified concatenation dimension will be included. So it seems that in order to stay compatible with Pandas, we need to include all data variables, but not add the concatenation dimension to data variables that do not already have that dimension. The problem, however, is what to do when both datasets have a variable `x` without the concatenation dimension? What should the resulting concatenation look like? I think in this case, the concat dimension should be added, to preserve information. It's just the unique variables that should be left alone. Please, could someone confirm that I have understood the problem correctly? Thank you in advance.	{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	concat_dim getting added to all variables of multifile datasets 314764258
511140703	https://github.com/pydata/xarray/issues/2064#issuecomment-511140703	https://api.github.com/repos/pydata/xarray/issues/2064	MDEyOklzc3VlQ29tbWVudDUxMTE0MDcwMw==	bonnland 10638475	2019-07-13T17:41:31Z	2019-07-13T22:51:13Z	NONE	I'm a new developer at the SciPy 2019 Sprints, and I'm interested in making a pull request for this issue as a learning step. @henrikca Would this be useful? Or would this conflict with your work? I went ahead and made a pull request because it looks like a very small change, potentially.	{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	concat_dim getting added to all variables of multifile datasets 314764258
452337937	https://github.com/pydata/xarray/issues/2064#issuecomment-452337937	https://api.github.com/repos/pydata/xarray/issues/2064	MDEyOklzc3VlQ29tbWVudDQ1MjMzNzkzNw==	floriankrb 8441217	2019-01-08T15:25:27Z	2019-01-08T15:25:27Z	CONTRIBUTOR	The comment from henrica above gives a solution to @xylar 's issue. Here is the original example where I added : data_vars='minimal'. ```python !/usr/bin/env python3 import xarray ds = xarray.open_mfdataset('example_jan.nc', concat_dim='Time', data_vars='minimal') print(ds) ``` Dimensions: (Time: 1, nOceanRegions: 7, nOceanRegionsTmp: 7, nVertLevels: 100) Dimensions without coordinates: Time, nOceanRegions, nOceanRegionsTmp, nVertLevels Data variables: refBottomDepth (nVertLevels) float64 dask.array<shape=(100,), chunksize=(100,)> time_avg_avgValueWithinOceanLayerRegion_avgLayerTemperature (Time, nOceanRegionsTmp, nVertLevels) float64 dask.array<shape=(1, 7, 100), chunksize=(1, 7, 100)> time_avg_avgValueWithinOceanRegion_avgSurfaceTemperature (Time, nOceanRegions) float64 dask.array<shape=(1, 7), chunksize=(1, 7)> time_avg_daysSinceStartOfSim (Time) timedelta64[ns] dask.array<shape=(1,), chunksize=(1,)> xtime_end (Time) \|S64 dask.array<shape=(1,), chunksize=(1,)> xtime_start (Time) \|S64 dask.array<shape=(1,), chunksize=(1,)> Attributes: history: Tue Dec 6 04:49:14 2016: ncatted -O -a ,global,d,, acme_alaph7... NCO: "4.6.2"	{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	concat_dim getting added to all variables of multifile datasets 314764258
399954316	https://github.com/pydata/xarray/issues/2064#issuecomment-399954316	https://api.github.com/repos/pydata/xarray/issues/2064	MDEyOklzc3VlQ29tbWVudDM5OTk1NDMxNg==	henrikca 40172290	2018-06-25T13:37:04Z	2018-06-25T13:37:04Z	NONE	I recently ran into a similar issue and found a potential solution. The functionality, as far as I understand, is already in the `open_mfdataset` function in the `data_vars ='minimal'` argument, in this case variables without `concat_dim` are included without adding the dimension. The current default is `data_vars ='all'` which include all variables with the added dimension. If the desired functionality shouldn't implicitly add new dimensions shouldn't the default be set to `'minimal'` instead? I think this is a very non-intrusive solution since it only affects the `open_mfdataset` function, and if you for some reason want the old behavior it is still there. An alternative way is to rewrite the `_dataset_concat` function I guess. This is my first time attempting to contribute, does this sound like a good idea? I can try to make a pull request but would very much value some input first.	{ "total_count": 2, "+1": 2, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	concat_dim getting added to all variables of multifile datasets 314764258
381993716	https://github.com/pydata/xarray/issues/2064#issuecomment-381993716	https://api.github.com/repos/pydata/xarray/issues/2064	MDEyOklzc3VlQ29tbWVudDM4MTk5MzcxNg==	xylar 4179064	2018-04-17T13:33:05Z	2018-04-17T13:33:05Z	NONE	Hmm, I agree that it shouldn't be hard but I don't really have time to do this right now. If no one has had a chance to look into it by mid May I might be able to take it on then.	{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	concat_dim getting added to all variables of multifile datasets 314764258
381975937	https://github.com/pydata/xarray/issues/2064#issuecomment-381975937	https://api.github.com/repos/pydata/xarray/issues/2064	MDEyOklzc3VlQ29tbWVudDM4MTk3NTkzNw==	rabernat 1197350	2018-04-17T12:34:15Z	2018-04-17T12:34:15Z	MEMBER	I'm glad! FWIW, I think this is a relatively simple fix within xarray. @xylar, if you are game, we would love to see a PR from you. Could be a good opportunity to learn more about xarray internals.	{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	concat_dim getting added to all variables of multifile datasets 314764258
381731078	https://github.com/pydata/xarray/issues/2064#issuecomment-381731078	https://api.github.com/repos/pydata/xarray/issues/2064	MDEyOklzc3VlQ29tbWVudDM4MTczMTA3OA==	xylar 4179064	2018-04-16T20:02:53Z	2018-04-16T20:02:53Z	NONE	@rabernat, your suggestion above has worked perfectly to get our unit tests working again in MPAS-Analysis so that will tide us over until this issue can be addressed directly in xarray. Thanks!	{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	concat_dim getting added to all variables of multifile datasets 314764258
381730268	https://github.com/pydata/xarray/issues/2064#issuecomment-381730268	https://api.github.com/repos/pydata/xarray/issues/2064	MDEyOklzc3VlQ29tbWVudDM4MTczMDI2OA==	xylar 4179064	2018-04-16T20:00:14Z	2018-04-16T20:00:14Z	NONE	Yes, I think that's exactly right.	{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	concat_dim getting added to all variables of multifile datasets 314764258
381728814	https://github.com/pydata/xarray/issues/2064#issuecomment-381728814	https://api.github.com/repos/pydata/xarray/issues/2064	MDEyOklzc3VlQ29tbWVudDM4MTcyODgxNA==	shoyer 1217238	2018-04-16T19:55:24Z	2018-04-16T19:55:24Z	MEMBER	I stand corrected. in 0.10.1, I also see the Time variable getting added to refBottomDepth when I open multiple files. So maybe this is not in fact a new problem but an existing issue that happened to behave as I expected only when opening a single file in previous versions. Sorry for not noticing that sooner. OK, in that case I think #2048 was still the right change/bug-fix, making multi-file and single-file behavior consistent. But you certainly have exposed a real issue here. But this issue raises an important basic point: we might want different behavior for variables in which concat_dim is already a dimension vs. variables for which it is not. Yes, we shouldn't implicitly add a new dimensions to variables in the case where the dimension already exists in the dataset. We only need the heuristics/comparisons when an entirely new dimension is being added.	{ "total_count": 1, "+1": 1, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	concat_dim getting added to all variables of multifile datasets 314764258
381725478	https://github.com/pydata/xarray/issues/2064#issuecomment-381725478	https://api.github.com/repos/pydata/xarray/issues/2064	MDEyOklzc3VlQ29tbWVudDM4MTcyNTQ3OA==	rabernat 1197350	2018-04-16T19:44:00Z	2018-04-16T19:44:00Z	MEMBER	But this issue raises an important basic point: we might want different behavior for variables in which `concat_dim` is already a dimension vs. variables for which it is not.	{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	concat_dim getting added to all variables of multifile datasets 314764258
381723442	https://github.com/pydata/xarray/issues/2064#issuecomment-381723442	https://api.github.com/repos/pydata/xarray/issues/2064	MDEyOklzc3VlQ29tbWVudDM4MTcyMzQ0Mg==	xylar 4179064	2018-04-16T19:37:02Z	2018-04-16T19:37:02Z	NONE	Yes, true. I'm trying to think if there are any examples where the fixed-in-time variables would not be coordinates. So far, none come to mind. Thanks for the tip.	{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	concat_dim getting added to all variables of multifile datasets 314764258
381722944	https://github.com/pydata/xarray/issues/2064#issuecomment-381722944	https://api.github.com/repos/pydata/xarray/issues/2064	MDEyOklzc3VlQ29tbWVudDM4MTcyMjk0NA==	rabernat 1197350	2018-04-16T19:35:12Z	2018-04-16T19:35:12Z	MEMBER	so you're fooling xarray into not including the time dimension in your non-time variables by making them coordinates in the above example? Exactly. They are coordinates. Those variables are usually related to grid geometry or constants, as I presume is `refBottomDepth` in your example.	{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	concat_dim getting added to all variables of multifile datasets 314764258
381722358	https://github.com/pydata/xarray/issues/2064#issuecomment-381722358	https://api.github.com/repos/pydata/xarray/issues/2064	MDEyOklzc3VlQ29tbWVudDM4MTcyMjM1OA==	xylar 4179064	2018-04-16T19:33:09Z	2018-04-16T19:33:09Z	NONE	@rabernat, so you're fooling xarray into not including the `time` dimension in your non-time variables by making them coordinates in the above example?	{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	concat_dim getting added to all variables of multifile datasets 314764258
381718089	https://github.com/pydata/xarray/issues/2064#issuecomment-381718089	https://api.github.com/repos/pydata/xarray/issues/2064	MDEyOklzc3VlQ29tbWVudDM4MTcxODA4OQ==	xylar 4179064	2018-04-16T19:17:31Z	2018-04-16T19:18:17Z	NONE	@shoyer, I stand corrected. in 0.10.1, I also see the `Time` variable getting added to `refBottomDepth` when I open multiple files. So maybe this is not in fact a new problem but an existing issue that happened to behave as I expected only when opening a single file in previous versions. Sorry for not noticing that sooner.	{ "total_count": 1, "+1": 1, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	concat_dim getting added to all variables of multifile datasets 314764258
381717472	https://github.com/pydata/xarray/issues/2064#issuecomment-381717472	https://api.github.com/repos/pydata/xarray/issues/2064	MDEyOklzc3VlQ29tbWVudDM4MTcxNzQ3Mg==	rabernat 1197350	2018-04-16T19:15:19Z	2018-04-16T19:15:19Z	MEMBER	👍 This is a persistent problem for me as well. I often find myself writing a preprocessor function like this `python def process_coords(ds, concat_dim='time', drop=True): coord_vars = [v for v in ds.data_vars if concat_dim not in ds[v].dims] if drop: return ds.drop(coord_vars) else: return ds.set_coords(coord_vars) ds = xr.open_mfdataset('*.nc', preprocess=process_coords)` The reason to drop the coordinates is to avoid the comparison that happens when you concatenate coords.	{ "total_count": 1, "+1": 1, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	concat_dim getting added to all variables of multifile datasets 314764258
381717309	https://github.com/pydata/xarray/issues/2064#issuecomment-381717309	https://api.github.com/repos/pydata/xarray/issues/2064	MDEyOklzc3VlQ29tbWVudDM4MTcxNzMwOQ==	xylar 4179064	2018-04-16T19:14:39Z	2018-04-16T19:14:39Z	NONE	@shoyer, in that case as well, `Time` is added to `refBottomDepth` in v 0.10.3, which was not the case in previous xarray versions.	{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	concat_dim getting added to all variables of multifile datasets 314764258
381707540	https://github.com/pydata/xarray/issues/2064#issuecomment-381707540	https://api.github.com/repos/pydata/xarray/issues/2064	MDEyOklzc3VlQ29tbWVudDM4MTcwNzU0MA==	shoyer 1217238	2018-04-16T18:42:06Z	2018-04-16T18:42:06Z	MEMBER	What happens if you open multiple files with `open_mfdataset()`, e.g., for both January and February. Does it result in a dataset with the right dimensions on each variable?	{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	concat_dim getting added to all variables of multifile datasets 314764258
381700098	https://github.com/pydata/xarray/issues/2064#issuecomment-381700098	https://api.github.com/repos/pydata/xarray/issues/2064	MDEyOklzc3VlQ29tbWVudDM4MTcwMDA5OA==	xylar 4179064	2018-04-16T18:16:47Z	2018-04-16T18:16:47Z	NONE	cc @pwolfram	{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	concat_dim getting added to all variables of multifile datasets 314764258

Advanced export

JSON shape: default, array, newline-delimited, object

CREATE TABLE [issue_comments] (
   [html_url] TEXT,
   [issue_url] TEXT,
   [id] INTEGER PRIMARY KEY,
   [node_id] TEXT,
   [user] INTEGER REFERENCES [users]([id]),
   [created_at] TEXT,
   [updated_at] TEXT,
   [author_association] TEXT,
   [body] TEXT,
   [reactions] TEXT,
   [performed_via_github_app] TEXT,
   [issue] INTEGER REFERENCES [issues]([id])
);
CREATE INDEX [idx_issue_comments_issue]
    ON [issue_comments] ([issue]);
CREATE INDEX [idx_issue_comments_user]
    ON [issue_comments] ([user]);

issue_comments

31 rows where issue = 314764258 sorted by updated_at descending

!/usr/bin/env python3

Advanced export