id,node_id,number,title,user,state,locked,assignee,milestone,comments,created_at,updated_at,closed_at,author_association,active_lock_reason,draft,pull_request,body,reactions,performed_via_github_app,state_reason,repo,type 391865060,MDExOlB1bGxSZXF1ZXN0MjM5MjYzOTU5,2616,API for N-dimensional combine,35968931,closed,0,,,25,2018-12-17T19:51:32Z,2019-06-25T16:18:29Z,2019-06-25T15:14:34Z,MEMBER,,0,pydata/xarray/pulls/2616,"Continues the discussion from #2553 about how the API for loading and combining data from multiple datasets should work. (Ultimately part of the solution to #2159) @shoyer this is for you to see how I envisaged the API would look, based on our discussion in #2553. For now you can ignore all the changes except the ones to the docstrings of `auto_combine` [here](https://github.com/pydata/xarray/compare/master...TomNicholas:feature/nd_combine_new_api?expand=1#diff-876f2fcf4679457325e8018f6d98660cR650), `manual_combine` [here](https://github.com/pydata/xarray/compare/master...TomNicholas:feature/nd_combine_new_api?expand=1#diff-876f2fcf4679457325e8018f6d98660cR550) and `open_mfdataset` [here](https://github.com/pydata/xarray/compare/master...TomNicholas:feature/nd_combine_new_api?expand=1#diff-e58ddc6340e4b5dd0a6e6b443c9a6da1R483). Feedback from anyone else is also encouraged, as really the point of this is to make the API as clear as possible to someone who hasn't delved into the code behind `auto_combine` and `open_mfdataset`. It makes sense to first work out the API, then change the internal implementation to match, using the internal functions developed in #2553. Therefore the tasks include: - [x] Decide on API for 'auto_combine' and 'open_mfdataset' - [x] Appropriate documentation - [x] Write internal implementation of `manual_combine` - [x] Write internal implementation of `auto-combine` - [x] Update `open_mfdataset` to match - [x] Write and reorganise tests - [x] Automatically ordering of string and datetime coords - [x] What's new explaining changes - [x] Make sure `auto_combine` and `manual_combine` appear on the [API page](http://xarray.pydata.org/en/stable/api.html) of the docs - [x] PEP8 compliance - [x] Python 3.5 compatibility - [x] AirSpeedVelocity tests for `auto_combine` - [x] Finish all TODOs - [x] Backwards-compatible API to start deprecation cycle - [x] Add examples from docstrings to main documentation pages","{""url"": ""https://api.github.com/repos/pydata/xarray/issues/2616/reactions"", ""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,,13221727,pull 379415229,MDExOlB1bGxSZXF1ZXN0MjI5ODg1Mjc2,2553,Feature: N-dimensional auto_combine,35968931,closed,0,,,25,2018-11-10T11:40:48Z,2018-12-13T17:16:16Z,2018-12-13T17:15:57Z,MEMBER,,0,pydata/xarray/pulls/2553,"### What I did Generalised the `auto_combine()` function to be able to concatenate and merge datasets along any number of dimensions, instead of just one. Provides one solution to #2159, and relevant for discussion in #2039. Currently it cannot deduce the order in which datasets should be concatenated along any one dimension from the coordinates, so it just concatenates them in order they are supplied. This means for an N-D concatenation the datasets have to be supplied as a list of lists, which is nested as many times as there are dimensions to be concatenated along. ### How it works In `_infer_concat_order_from_nested_list()` the nested list of datasets is recursively traversed in order to create a dictionary of datasets, where the keys are the corresponding ""tile IDs"". These tile IDs are tuples serving as multidimensional indexes for the position of the dataset within the hypercube of all datasets which are to be combined. For example four datasets which are to be combined along two dimensions would be supplied as ```python datasets = [[ds0, ds1], [ds2, ds3]] ``` and given tile_IDs to be stored as ```python combined_ids = {(0, 0): ds0, (0, 1): ds1, (1, 0): ds2, (1, 1): ds3} ``` Using this unambiguous intermediate structure means that another method could be used to organise the datasets for concatenation (i.e. reading the values of their coordinates), and a new keyword argument `infer_order_from_coords` used to choose the method. The `_combine_nd()` function concatenates along one dimension at a time, reducing the length of the tile_ID tuple by one each time `_combine_along_first_dim()` is called. After each concatenation the different variables are merged, so the new `auto_combine()` is essentially like calling the old one once for each dimension in `concat_dims`. ### Still to do I would like people's opinions on the method I've chosen to do this, and any feedback on the code quality would be appreciated. Assuming we're happy with the method I used here, then the remaining tasks include: - [x] More tests of the final `auto_combine()` function - [x] ~~Add option to deduce concatenation order from coords (or this could be a separate PR)~~ - [x] Integrate this all the way up to `open_mfdataset()`. - [x] Unit tests for `open_mfdataset()` - [x] More tests that the user has inputted a valid structure of datasets - [x] ~~Possibly parallelize the concatenation step?~~ - [x] A few other small `TODO`s which are in `combine.py` - [x] Proper documentation showing how the input should be structured. - [x] Fix failing unit tests on python 2.7 (though support for 2.7 is being dropped at the end of 2018?) - [x] Fix failing unit tests on python 3.5 - [x] Update what's new This PR was intended to solve the common use case of collecting output from a simulation which was parallelized in multiple dimensions. I would like to write a tutorial about how to use xarray to do this, including examples of how to preprocess the data and discard processor ghost cells. ","{""url"": ""https://api.github.com/repos/pydata/xarray/issues/2553/reactions"", ""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,,13221727,pull