issue_comments: 489135792

This data as json

html_url	issue_url	id	node_id	user	created_at	updated_at	author_association	body	reactions	performed_via_github_app	issue
https://github.com/pydata/xarray/issues/1823#issuecomment-489135792	https://api.github.com/repos/pydata/xarray/issues/1823	489135792	MDEyOklzc3VlQ29tbWVudDQ4OTEzNTc5Mg==	2448579	2019-05-03T15:29:14Z	2019-05-03T15:40:27Z	MEMBER	One common use-case is files with large numbers of `concat_dim`-invariant non-dimensional co-ordinates. This is easy to speed up by dropping those variables from all but the first file. e.g. https://github.com/pangeo-data/esgf2xarray/blob/6a5e4df0d329c2f23b403cbfbb65f0f1dfa98d52/esgf2zarr/aggregate.py#L107-L110 `python # keep only coordinates from first ensemble member to simplify merge first = member_dsets_aligned[0] rest = [mds.reset_coords(drop=True) for mds in member_dsets_aligned[1:]] objs_to_concat = [first] + rest` Similarly https://github.com/NCAR/intake-esm/blob/e86a8e8a80ce0fd4198665dbef3ba46af264b5ea/intake_esm/aggregate.py#L53-L57 `python def merge_vars_two_datasets(ds1, ds2): """ Merge two datasets, dropping all variables from second dataset that already exist in the first dataset's coordinates. """` See also #2039 (second code block) One way to do this might be to add a `master_file` kwarg to `open_mfdataset`. This would imply `coords='minimal', join='exact'` (I think; `prealigned=True` in some other proposals) and would drop non-dimensional coordinates from all but the first file and then call concat. As bonus it would assign attributes from the `master_file` to the merged dataset (for which I think there are open issues) : this functionality exists in `netCDF4.MFDataset` so that's a plus. EDIT: #2039 (third code block) is also a possibility. This might look like `python xr.open_mfdataset('files*.nc', master_file='first', concat_dim='time')` in which case the first file is read; all coords that are not `concat_dim` become `drop_variables` for an `open_dataset` call that reads the remaining files. We then merge with the first dataset and assign attrs. EDIT2: `master_file` combines two different functionalities here: specifying a "template file" and a file to choose attributes from. So maybe we need two kwargs: `template_file` and `attrs_from`?	{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }		288184220