home / github / pull_requests

Menu
  • GraphQL API
  • Search all tables

pull_requests: 229885276

This data as json

id node_id number state locked title user body created_at updated_at closed_at merged_at merge_commit_sha assignee milestone draft head base author_association auto_merge repo url merged_by
229885276 MDExOlB1bGxSZXF1ZXN0MjI5ODg1Mjc2 2553 closed 0 Feature: N-dimensional auto_combine 35968931 ### What I did Generalised the `auto_combine()` function to be able to concatenate and merge datasets along any number of dimensions, instead of just one. Provides one solution to #2159, and relevant for discussion in #2039. Currently it cannot deduce the order in which datasets should be concatenated along any one dimension from the coordinates, so it just concatenates them in order they are supplied. This means for an N-D concatenation the datasets have to be supplied as a list of lists, which is nested as many times as there are dimensions to be concatenated along. ### How it works In `_infer_concat_order_from_nested_list()` the nested list of datasets is recursively traversed in order to create a dictionary of datasets, where the keys are the corresponding "tile IDs". These tile IDs are tuples serving as multidimensional indexes for the position of the dataset within the hypercube of all datasets which are to be combined. For example four datasets which are to be combined along two dimensions would be supplied as ```python datasets = [[ds0, ds1], [ds2, ds3]] ``` and given tile_IDs to be stored as ```python combined_ids = {(0, 0): ds0, (0, 1): ds1, (1, 0): ds2, (1, 1): ds3} ``` Using this unambiguous intermediate structure means that another method could be used to organise the datasets for concatenation (i.e. reading the values of their coordinates), and a new keyword argument `infer_order_from_coords` used to choose the method. The `_combine_nd()` function concatenates along one dimension at a time, reducing the length of the tile_ID tuple by one each time `_combine_along_first_dim()` is called. After each concatenation the different variables are merged, so the new `auto_combine()` is essentially like calling the old one once for each dimension in `concat_dims`. ### Still to do I would like people's opinions on the method I've chosen to do this, and any feedback on the code quality would be appreciated. Assuming we're happy with the method I used here, then the remaining tasks include: - [x] More tests of the final `auto_combine()` function - [x] ~~Add option to deduce concatenation order from coords (or this could be a separate PR)~~ - [x] Integrate this all the way up to `open_mfdataset()`. - [x] Unit tests for `open_mfdataset()` - [x] More tests that the user has inputted a valid structure of datasets - [x] ~~Possibly parallelize the concatenation step?~~ - [x] A few other small `TODO`s which are in `combine.py` - [x] Proper documentation showing how the input should be structured. - [x] Fix failing unit tests on python 2.7 (though support for 2.7 is being dropped at the end of 2018?) - [x] Fix failing unit tests on python 3.5 - [x] Update what's new This PR was intended to solve the common use case of collecting output from a simulation which was parallelized in multiple dimensions. I would like to write a tutorial about how to use xarray to do this, including examples of how to preprocess the data and discard processor ghost cells. 2018-11-10T11:40:48Z 2018-12-13T17:16:16Z 2018-12-13T17:15:57Z 2018-12-13T17:15:56Z 9e8707d2041cfa038c31fc2284c1fe40bc3368e9     0 ebbe47f450ed4407655bd9a4ed45274b140452dd 0d6056e8816e3d367a64f36c7f1a5c4e1ce4ed4e MEMBER   13221727 https://github.com/pydata/xarray/pull/2553  

Links from other tables

  • 0 rows from pull_requests_id in labels_pull_requests
Powered by Datasette · Queries took 0.861ms