home / github / issue_comments

Menu
  • Search all tables
  • GraphQL API

issue_comments: 489135792

This data as json

html_url issue_url id node_id user created_at updated_at author_association body reactions performed_via_github_app issue
https://github.com/pydata/xarray/issues/1823#issuecomment-489135792 https://api.github.com/repos/pydata/xarray/issues/1823 489135792 MDEyOklzc3VlQ29tbWVudDQ4OTEzNTc5Mg== 2448579 2019-05-03T15:29:14Z 2019-05-03T15:40:27Z MEMBER

One common use-case is files with large numbers of concat_dim-invariant non-dimensional co-ordinates. This is easy to speed up by dropping those variables from all but the first file.

e.g. https://github.com/pangeo-data/esgf2xarray/blob/6a5e4df0d329c2f23b403cbfbb65f0f1dfa98d52/esgf2zarr/aggregate.py#L107-L110 python # keep only coordinates from first ensemble member to simplify merge first = member_dsets_aligned[0] rest = [mds.reset_coords(drop=True) for mds in member_dsets_aligned[1:]] objs_to_concat = [first] + rest

Similarly https://github.com/NCAR/intake-esm/blob/e86a8e8a80ce0fd4198665dbef3ba46af264b5ea/intake_esm/aggregate.py#L53-L57

python def merge_vars_two_datasets(ds1, ds2): """ Merge two datasets, dropping all variables from second dataset that already exist in the first dataset's coordinates. """

See also #2039 (second code block)

One way to do this might be to add a master_file kwarg to open_mfdataset. This would imply coords='minimal', join='exact' (I think; prealigned=True in some other proposals) and would drop non-dimensional coordinates from all but the first file and then call concat.

As bonus it would assign attributes from the master_file to the merged dataset (for which I think there are open issues) : this functionality exists in netCDF4.MFDataset so that's a plus.

EDIT: #2039 (third code block) is also a possibility. This might look like python xr.open_mfdataset('files*.nc', master_file='first', concat_dim='time') in which case the first file is read; all coords that are not concat_dim become drop_variables for an open_dataset call that reads the remaining files. We then merge with the first dataset and assign attrs.

EDIT2: master_file combines two different functionalities here: specifying a "template file" and a file to choose attributes from. So maybe we need two kwargs: template_file and attrs_from?

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  288184220
Powered by Datasette · Queries took 0.877ms · About: xarray-datasette