html_url,issue_url,id,node_id,user,created_at,updated_at,author_association,body,reactions,performed_via_github_app,issue
https://github.com/pydata/xarray/issues/2159#issuecomment-435733418,https://api.github.com/repos/pydata/xarray/issues/2159,435733418,MDEyOklzc3VlQ29tbWVudDQzNTczMzQxOA==,1217238,2018-11-05T02:03:00Z,2018-11-05T02:03:00Z,MEMBER,"> This is fine though right? We can do all of this, because it should compartmentalise fairly easily shouldn't it? You end up with logic like:
Yes, this seems totally fine to me.
> We don't need to, but I don't think it would be that hard (if the structure above is feasible), and I think it's a common use case. Also there's an argument for putting in special effort to generalize this function as much as possible, because it lowers the barrier to entry for xarray for new users. Though perhaps I'm just biased because it happens to be my use case...
Sure, no opposition from me if you want to do it! 👍 ","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,324350248
https://github.com/pydata/xarray/issues/2159#issuecomment-435544853,https://api.github.com/repos/pydata/xarray/issues/2159,435544853,MDEyOklzc3VlQ29tbWVudDQzNTU0NDg1Mw==,1217238,2018-11-03T00:22:46Z,2018-11-03T00:22:46Z,MEMBER,"@TomNicholas I agree with your steps 1/2/3 for `open_mfdataset`.
My concern with a single `prealigned=True` argument is that there are two distinct use-cases:
1. Checking coordinates along the dimension(s) being concatenated to determine the result order
2. Checking coordinates along other non-concatenated dimensions to verify consistency
Currently we always do (2) and never do (1).
We definitely want an option to disable (2) for speed, and also want an option to support (1) (what you propose here). But these are distinct use cases -- we probably want to support all permutations of 1/2.
> However this would mean that users wanting to do a multi-dimensional auto_combine on data without monotonic indexes would have to supply their datasets in some way that specifies their desired N-dimensional ordering
I'm not sure we need to support this yet -- it would be enough to have keyword argument for falling back to the existing behavior that only supports 1D concatenation in the order provided.
> Also I'm assuming we are not going to provide functionality to handle uneven sub-lists, e.g. [[t1x1, t1x2], [t2x1, t2x2, t2x3]]?
Agreed, not important unless someone really wants/needs it.","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,324350248
https://github.com/pydata/xarray/issues/2159#issuecomment-418476055,https://api.github.com/repos/pydata/xarray/issues/2159,418476055,MDEyOklzc3VlQ29tbWVudDQxODQ3NjA1NQ==,1217238,2018-09-04T18:44:35Z,2018-09-04T22:16:34Z,MEMBER,"NumPy's handling of object arrays is unfortunately inconsistent. So maybe it isn't the best idea to use NumPy arrays for this.
Python's built-in list/dict might be better choices here. Something like:
```python
def concat_nd(datasets):
# find the set of dimensions across which to possibly merge
# could possibly use OrderedSet here:
# https://github.com/pydata/xarray/blob/v0.10.8/xarray/core/utils.py#L401
all_dims = set(ds.dims for ds in datasets)
# Create a map from each dimension to a tuple giving the size of each
# dimension on an input dataset. Not all collections of datasets have consistent
# sizes along each dimension, but the ones we can automatically concatenate do.
# I recommend researching how ""chunks"" work in dask.array:
# http://dask.pydata.org/en/latest/array-design.html
# http://dask.pydata.org/en/latest/array-chunks.html
chunks = {dim: ... for dim in all_dims}
# find the sorted, de-duplicated union of all indexes along those dimensions
# np.unique followed by wrapping with pd.Index()
# might work OK for the ""union"" function here
combined_indexes = {dim: union([ds.indexes[dim] for ds in datasets]) for dim in all_dims}
# create a map mapping from ""tile id"" to dataset
# get_indexes() should use pandas.Index.get_indexer to lookup ds.indexes[dim]
# in the combined index, e.g., of type Dict[Tuple[int, ...], xarray.Dataset]
indexes_to_dataset = {get_indexes(ds, chunks, combined_coords): ds for ds in datasets}
# call concat() in a loop to construct the combined dataset
```","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,324350248
https://github.com/pydata/xarray/issues/2159#issuecomment-416389795,https://api.github.com/repos/pydata/xarray/issues/2159,416389795,MDEyOklzc3VlQ29tbWVudDQxNjM4OTc5NQ==,1217238,2018-08-27T22:29:22Z,2018-08-27T22:29:22Z,MEMBER,"@TomNicholas I think your analysis is correct here.
I suspect that in most cases we could figure out how to tile datasets by looking at 1D coordinates along each dimension (e.g., indexes for each dataset), e.g., to find a ""chunk id"" along each concatenated dimension.
These could be used to build something like a NumPy object array of xarray.Dataset/DataArray objects, which could split up into a bunch of 1D calls to `xarray.concat()`.
I would rather avoid using the `positions` argument of `concat`. It doesn't really add any flexibility compared to reordering the inputs with `xarray.core.nputils.inverse_permutation`.
> Final point - this common use case also has the added complexity of having ghost or guard cells around every dataset, which should be thrown away. Clearly some user input is required here (ghost_cells_x=2, ghost_cells_y=2, ghost_cells_z=0, ...), but I'm really not sure what the best way to fit that kind of logic in is. Yet more arguments to open_mfdataset?
We could potentially just encourage using the existing `preprocess` argument.","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,324350248
https://github.com/pydata/xarray/issues/2159#issuecomment-393626605,https://api.github.com/repos/pydata/xarray/issues/2159,393626605,MDEyOklzc3VlQ29tbWVudDM5MzYyNjYwNQ==,1217238,2018-05-31T18:19:32Z,2018-05-31T18:19:32Z,MEMBER,"@aluhamaa I don't think you're missing anything here. I agree that it would be pretty straightforward, it just would take a bit of work.","{""total_count"": 1, ""+1"": 1, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,324350248
https://github.com/pydata/xarray/issues/2159#issuecomment-391509672,https://api.github.com/repos/pydata/xarray/issues/2159,391509672,MDEyOklzc3VlQ29tbWVudDM5MTUwOTY3Mg==,1217238,2018-05-23T21:57:56Z,2018-05-23T21:57:56Z,MEMBER,@TomNicholas I think you could use the existing `preprocess` argument to `open_mfdataset()` for that.,"{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,324350248
https://github.com/pydata/xarray/issues/2159#issuecomment-391499524,https://api.github.com/repos/pydata/xarray/issues/2159,391499524,MDEyOklzc3VlQ29tbWVudDM5MTQ5OTUyNA==,1217238,2018-05-23T21:17:42Z,2018-05-23T21:17:42Z,MEMBER,"I agree with @jhamman that it would take effort from an interested developer to do this but in principle it's quite doable.
I think our logic in [auto_combine](https://github.com/pydata/xarray/blob/48d55eea052fec204b843babdc81c258f3ed5ce1/xarray/core/combine.py#L371) (which powers open_mfdataset) could probably be extended to handle concatenation across multiple dimensions. The main implementation would need to look at coordinates along concatenated dimensions to break the operation into multiple calls `xarray.concat()`","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,324350248