pull_requests: 229885276

This data as json

id	node_id	number	state	locked	title	user	body	created_at	updated_at	closed_at	merged_at	merge_commit_sha	assignee	milestone	draft	head	base	author_association	auto_merge	repo	url	merged_by
229885276	MDExOlB1bGxSZXF1ZXN0MjI5ODg1Mjc2	2553	closed	0	Feature: N-dimensional auto_combine	35968931	### What I did Generalised the `auto_combine()` function to be able to concatenate and merge datasets along any number of dimensions, instead of just one. Provides one solution to #2159, and relevant for discussion in #2039. Currently it cannot deduce the order in which datasets should be concatenated along any one dimension from the coordinates, so it just concatenates them in order they are supplied. This means for an N-D concatenation the datasets have to be supplied as a list of lists, which is nested as many times as there are dimensions to be concatenated along. ### How it works In `_infer_concat_order_from_nested_list()` the nested list of datasets is recursively traversed in order to create a dictionary of datasets, where the keys are the corresponding "tile IDs". These tile IDs are tuples serving as multidimensional indexes for the position of the dataset within the hypercube of all datasets which are to be combined. For example four datasets which are to be combined along two dimensions would be supplied as ```python datasets = [[ds0, ds1], [ds2, ds3]] ``` and given tile_IDs to be stored as ```python combined_ids = {(0, 0): ds0, (0, 1): ds1, (1, 0): ds2, (1, 1): ds3} ``` Using this unambiguous intermediate structure means that another method could be used to organise the datasets for concatenation (i.e. reading the values of their coordinates), and a new keyword argument `infer_order_from_coords` used to choose the method. The `_combine_nd()` function concatenates along one dimension at a time, reducing the length of the tile_ID tuple by one each time `_combine_along_first_dim()` is called. After each concatenation the different variables are merged, so the new `auto_combine()` is essentially like calling the old one once for each dimension in `concat_dims`. ### Still to do I would like people's opinions on the method I've chosen to do this, and any feedback on the code quality would be appreciated. Assuming we're happy with the method I used here, then the remaining tasks include: - [x] More tests of the final `auto_combine()` function - [x] ~~Add option to deduce concatenation order from coords (or this could be a separate PR)~~ - [x] Integrate this all the way up to `open_mfdataset()`. - [x] Unit tests for `open_mfdataset()` - [x] More tests that the user has inputted a valid structure of datasets - [x] ~~Possibly parallelize the concatenation step?~~ - [x] A few other small `TODO`s which are in `combine.py` - [x] Proper documentation showing how the input should be structured. - [x] Fix failing unit tests on python 2.7 (though support for 2.7 is being dropped at the end of 2018?) - [x] Fix failing unit tests on python 3.5 - [x] Update what's new This PR was intended to solve the common use case of collecting output from a simulation which was parallelized in multiple dimensions. I would like to write a tutorial about how to use xarray to do this, including examples of how to preprocess the data and discard processor ghost cells.	2018-11-10T11:40:48Z	2018-12-13T17:16:16Z	2018-12-13T17:15:57Z	2018-12-13T17:15:56Z	9e8707d2041cfa038c31fc2284c1fe40bc3368e9			0	ebbe47f450ed4407655bd9a4ed45274b140452dd	0d6056e8816e3d367a64f36c7f1a5c4e1ce4ed4e	MEMBER		13221727	https://github.com/pydata/xarray/pull/2553

Links from other tables

0 rows from pull_requests_id in labels_pull_requests