html_url,issue_url,id,node_id,user,created_at,updated_at,author_association,body,reactions,performed_via_github_app,issue https://github.com/pydata/xarray/pull/2553#issuecomment-446420988,https://api.github.com/repos/pydata/xarray/issues/2553,446420988,MDEyOklzc3VlQ29tbWVudDQ0NjQyMDk4OA==,35968931,2018-12-12T00:57:56Z,2018-12-12T00:57:56Z,MEMBER,Removed the unnecessary argument.,"{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,379415229 https://github.com/pydata/xarray/pull/2553#issuecomment-446223544,https://api.github.com/repos/pydata/xarray/issues/2553,446223544,MDEyOklzc3VlQ29tbWVudDQ0NjIyMzU0NA==,35968931,2018-12-11T14:35:12Z,2018-12-11T14:35:12Z,MEMBER,"Okay I've reverted the API, so it basically converts a `concat_dim` input to it's multidimensional version before continuing. If we merge this then should I start a separate Pull Request for further discussion about the API? One of the Travis CI builds failed but again I don't think that was me.","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,379415229 https://github.com/pydata/xarray/pull/2553#issuecomment-445903380,https://api.github.com/repos/pydata/xarray/issues/2553,445903380,MDEyOklzc3VlQ29tbWVudDQ0NTkwMzM4MA==,35968931,2018-12-10T17:37:25Z,2018-12-10T17:37:25Z,MEMBER,"> This sounds pretty good to me. Okay good. So with this API then `auto_combine()` (and therefore the default behaviour of `open_mfdatset()`) would be to [throw a warning](https://github.com/pydata/xarray/issues/2159#issuecomment-435336049) if there isn't enough info in the dataset coords to order them? > OK, do you want to go ahead and revert the documentation/public API for now? Yes, I'll do that. > I would even be OK supporting nested lists temporarily ... ""at your own risk"" If I revert the documentation and you merge this PR then that's exactly what we will have, which would be useful for me until we do the public API. (Also it seems that whatever was wrong with cftime has been fixed now, as the CI tests are passing.)","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,379415229 https://github.com/pydata/xarray/pull/2553#issuecomment-445845065,https://api.github.com/repos/pydata/xarray/issues/2553,445845065,MDEyOklzc3VlQ29tbWVudDQ0NTg0NTA2NQ==,35968931,2018-12-10T14:59:30Z,2018-12-10T15:21:05Z,MEMBER,"> There are probably some edge cases where the existing auto_combine would you be a little more slopping about how you put together your lists of variables. I think that's probably the case, but I also think that those edge cases will be so specific that maybe we don't have to explicitly support them. We could just say that anyone who has a combination of datasets that is that funky can just concatenate them themselves? > For users, I think it makes sense to have... I agree, two separate functions is a lot more intuitive than having `auto_combine()` behave either auto-magically or totally manually depending on the value of an optional keyword argument. The only problem with that idea is that both of these functions should be options for `open_mfdataset()`, so you still have multiple possible argument structures for that function. I think that's okay though - it would be a lot easier to explain if you say ""if the keyword argument `combine='auto'` then the function `auto_combine()` is used, if `combine='manual'` then `nested_concat_and_merge()` is used, and the filepaths must be structured accordingly."" > If you like, we could also merge this work (which is excellent progress towards user-facing APIs) but keep the changes internal to xarray for now until we figure out the public APIs. That would be great. Then I could start using the master branch of xarray again in [my code](https://github.com/boutproject/xBOUT/issues/22), while we redo the public API. If I set `concat_dims` back to `concat_dim` then it *should* be completely backwards-compatible. ","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,379415229 https://github.com/pydata/xarray/pull/2553#issuecomment-444712699,https://api.github.com/repos/pydata/xarray/issues/2553,444712699,MDEyOklzc3VlQ29tbWVudDQ0NDcxMjY5OQ==,35968931,2018-12-06T01:18:41Z,2018-12-06T10:05:40Z,MEMBER,"Specifying `dim=None` will specify (1) explicitly, but otherwise yes, that's exactly what it does. That is how the current auto_combine behaves too, just for 1D obviously. ","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,379415229 https://github.com/pydata/xarray/pull/2553#issuecomment-444712936,https://api.github.com/repos/pydata/xarray/issues/2553,444712936,MDEyOklzc3VlQ29tbWVudDQ0NDcxMjkzNg==,35968931,2018-12-06T01:20:05Z,2018-12-06T10:03:43Z,MEMBER,"Also if you specify `dim='something'` when that dimension doesn't already exist, you are explicitly telling it to add a new dimension. ","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,379415229 https://github.com/pydata/xarray/pull/2553#issuecomment-444715777,https://api.github.com/repos/pydata/xarray/issues/2553,444715777,MDEyOklzc3VlQ29tbWVudDQ0NDcxNTc3Nw==,35968931,2018-12-06T01:34:18Z,2018-12-06T10:03:26Z,MEMBER,"I think that would mean there might be some situations that 1D `auto_combine()` could deal with but `nested_concat_and_merge()` could not. Situations where you can only merge the variables once they have been individually concatenated. (actually I'm not sure - I would need to think about that) I think this is a question of whether you think that: a) `auto_combine()` is a well-defined operation that is still well-defined in N-D or b) `auto_combine()`'s purpose is to do it's best to combine whatever is thrown at it, and more surgical N-D operations should be the job of another function I personally think a), but I expect users who haven't read the source code for `auto_combine()` might disagree. ","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,379415229 https://github.com/pydata/xarray/pull/2553#issuecomment-444713178,https://api.github.com/repos/pydata/xarray/issues/2553,444713178,MDEyOklzc3VlQ29tbWVudDQ0NDcxMzE3OA==,35968931,2018-12-06T01:21:20Z,2018-12-06T01:21:20Z,MEMBER,"I appreciate that this is pretty complicated, perhaps it should have its own section in the docs (I can do another PR?) ","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,379415229 https://github.com/pydata/xarray/pull/2553#issuecomment-444708274,https://api.github.com/repos/pydata/xarray/issues/2553,444708274,MDEyOklzc3VlQ29tbWVudDQ0NDcwODI3NA==,35968931,2018-12-06T00:56:01Z,2018-12-06T01:01:49Z,MEMBER,"Thanks for the comments.
> What happens if you have a nested list of Dataset objects with different data variables? This is supported. This new `auto_combine()` simply applies the old `auto_combine()` N times along N dimensions, so if the grid of results is auto-combinable along all it's dimensions separately, then the new `auto_combine()` will auto-magically combine it all, e.g: ```python objs = [[Dataset({'foo': ('x', [0, 1])}), Dataset({'bar': ('x', [10, 20])})], [Dataset({'foo': ('x', [2, 3])}), Dataset({'bar': ('x', [30, 40])})]] expected = Dataset({'foo': ('x', [0, 1, 2, 3]), 'bar': ('x', [10, 20, 30, 40])}) # This works actual = auto_combine(objs, concat_dims=['x', None]) assert_identical(expected, actual) # Also works auto-magically actual = auto_combine(objs) assert_identical(expected, actual) # Proving it works symmetrically objs = [[Dataset({'foo': ('x', [0, 1])}), Dataset({'foo': ('x', [2, 3])})], [Dataset({'bar': ('x', [10, 20])}), Dataset({'bar': ('x', [30, 40])})]] actual = auto_combine(objs, concat_dims=[None, 'x']) assert_identical(expected, actual) ``` (I'll add this example as another unit test)
I should point out that there is one way in which this function is not exactly as general as `auto_combine()` applied N times though. The options `compat`, `data_vars`, and `coords` are specified once for the combining along all dimensions, so you can't currently tell it to use `compat='identical'` along `dim1` and `compat='no_conflicts'` along `dim2`, it has to be the same for both. I thought about making these kwargs also accept lists, but although that would be easy to do it seemed like it would complicate the API for a very specific use case?
> It might be better to have a separate `nested_concat()` function rather than to squeeze this all into `auto_combine()`. That was basically what I tried to do in my first attempt, but nested concatenation without merging along every dimension misses some common use cases, for example if you wanted to `auto_combine` (or `open_mfdataset()`) files structured like ```bash root ├─ time1 │ ├── density.nc │ └── temperature.nc └─ time2 ├── density.nc └── temperature.nc ``` then you would want to merge `density.nc` and `temperature.nc`, then concat along 'time'. I suppose you could have `nested_auto_combine()` as a separate function though. I don't really think that's necessary though, as apart from the substitution `concat_dim` -> `concat_dims` then this is fully backwards-compatible.
You might also find it interesting to see how I've used this fork in my own code: I create the grid of datasets [here](https://github.com/TomNicholas/xBOUT/blob/615f6caf01fa3d90cdfe8a5b517b477cc0c67026/xbout/load.py#L127), so that I can combine them [here](https://github.com/TomNicholas/xBOUT/blob/615f6caf01fa3d90cdfe8a5b517b477cc0c67026/xbout/load.py#L23).
I have a question actually - currently if the concat or merge fails, then the error message won't clearly tell you which dimension it was trying to combine along when it failed. Is there a way to do that easily with `try... except...` statements? Something like ```python for dim in concat_dims: try: _auto_combine_along_first_dim(...) except MergeError or ValueError as err: raise ValueError(f""Encoutered {err} while trying to combine along dimension {dim}"") ```
(Also something else is breaking in `cftime` on the python 2.7 builds on AppVeyor now...) ","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,379415229 https://github.com/pydata/xarray/pull/2553#issuecomment-443536378,https://api.github.com/repos/pydata/xarray/issues/2553,443536378,MDEyOklzc3VlQ29tbWVudDQ0MzUzNjM3OA==,35968931,2018-12-02T19:50:59Z,2018-12-02T20:08:28Z,MEMBER,"Now that all the tests are passing for all python versions, and I've added an example of multidimensional concatenation to the docstring of `auto_combine`, I think this is ready to be merged. However, the appveyor builds on python 2.7 are now failing on setup when they try to install cftime, which I don't think was anything I did?","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,379415229 https://github.com/pydata/xarray/pull/2553#issuecomment-443456199,https://api.github.com/repos/pydata/xarray/issues/2553,443456199,MDEyOklzc3VlQ29tbWVudDQ0MzQ1NjE5OQ==,35968931,2018-12-01T20:30:30Z,2018-12-01T20:31:11Z,MEMBER,"Okay, I've rewritten it using `itertools.groupby` instead of `toolz.itertoolz.groupby` so that there should no longer be a dependence on `toolz`. The `itertools` groupby assumes the input is sorted, which makes the code slightly more complex, but I think that's still neater than including the functions and tests required for `toolz.itertoolz.groupby`.","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,379415229 https://github.com/pydata/xarray/pull/2553#issuecomment-443437580,https://api.github.com/repos/pydata/xarray/issues/2553,443437580,MDEyOklzc3VlQ29tbWVudDQ0MzQzNzU4MA==,35968931,2018-12-01T16:18:54Z,2018-12-01T16:43:05Z,MEMBER,"I've fixed the code so it works with python 3.5 & 2.7 (tested on my local machine), but the tests on python 2.7 are still failing because for some reason it can't find the itertoolz module. I tried to fix this by adding toolz an explicit dependency in `setup.py` but that didn't work either. However, given that xarray is supposed to be dropping python 2 support at the end of this year (https://github.com/pydata/xarray/issues/1830), does this particularly matter?","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,379415229 https://github.com/pydata/xarray/pull/2553#issuecomment-441639494,https://api.github.com/repos/pydata/xarray/issues/2553,441639494,MDEyOklzc3VlQ29tbWVudDQ0MTYzOTQ5NA==,35968931,2018-11-26T13:31:11Z,2018-12-01T15:56:36Z,MEMBER,"This is basically done now - I've implemented everything I wanted to, and included unit tests for the new functionality. I'm also successfully using it in my [personal code now](https://github.com/TomNicholas/xBOUT/blob/master/xbout/load.py). @shoyer I haven't changed the way the combine function is applied repeatedly to match the implementation in `numpy.block()`, I think the way [I've done it](https://github.com/pydata/xarray/blob/60c93ba59fd6636a9f6ebe2bb460b9924d9ebec7/xarray/core/combine.py#L478) is relatively clear, but let me know if you think it should be refactored. ~~I don't understand why some of the CI tests are failing - all test which run on my machine pass, and the errors in the log seem to come from dask arrays not being loaded, i.e:~~ ``` AssertionError: Dimensions: (x: 10, y: 8) Dimensions without coordinates: x, y Data variables: foo (x, y) float64 -1.19 0.4476 0.6283 ... -0.09699 -0.2311 -0.6482 Dimensions: (x: 10, y: 8) Dimensions without coordinates: x, y Data variables: foo (x, y) float64 dask.array ```","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,379415229