id,node_id,number,title,user,state,locked,assignee,milestone,comments,created_at,updated_at,closed_at,author_association,active_lock_reason,draft,pull_request,body,reactions,performed_via_github_app,state_reason,repo,type 403326458,MDU6SXNzdWU0MDMzMjY0NTg=,2710,xarray.DataArray.expand_dims() can only expand dimension for a point coordinate ,10720577,closed,0,,,14,2019-01-25T20:46:05Z,2020-02-20T15:35:22Z,2020-02-20T15:35:22Z,CONTRIBUTOR,,,,"#### Current `expand_dims` functionality Apparently, `expand_dims` can only create a dimension for a point coordinate, i.e. it promotes a scalar coordinate into 1D coordinate. Here is an example: ```python >>> coords = {""b"": range(5), ""c"": range(3)} >>> da = xr.DataArray(np.ones([5, 3]), coords=coords, dims=list(coords.keys())) >>> da array([[1., 1., 1.], [1., 1., 1.], [1., 1., 1.], [1., 1., 1.], [1., 1., 1.]]) Coordinates: * b (b) int64 0 1 2 3 4 * c (c) int64 0 1 2 >>> da[""a""] = 0 # create a point coordinate >>> da array([[1., 1., 1.], [1., 1., 1.], [1., 1., 1.], [1., 1., 1.], [1., 1., 1.]]) Coordinates: * b (b) int64 0 1 2 3 4 * c (c) int64 0 1 2 a int64 0 >>> da.expand_dims(""a"") # create a new dimension ""a"" for the point coordinated array([[[1., 1., 1.], [1., 1., 1.], [1., 1., 1.], [1., 1., 1.], [1., 1., 1.]]]) Coordinates: * b (b) int64 0 1 2 3 4 * c (c) int64 0 1 2 * a (a) int64 0 >>> ``` #### Problem description I want to be able to do 2 more things with `expand_dims` or maybe a related/similar method: 1) broadcast the data across 1 or more new dimensions 2) expand an existing dimension to include 1 or more new coordinates #### Here is the code I currently use to accomplish this ``` from collections import OrderedDict import xarray as xr def expand_dimensions(data, fill_value=np.nan, **new_coords): """"""Expand (or add if it doesn't yet exist) the data array to fill in new coordinates across multiple dimensions. If a dimension doesn't exist in the dataarray yet, then the result will be `data`, broadcasted across this dimension. >>> da = xr.DataArray([1, 2, 3], dims=""a"", coords=[[0, 1, 2]]) >>> expand_dimensions(da, b=[1, 2, 3, 4, 5]) array([[ 1., 1., 1., 1., 1.], [ 2., 2., 2., 2., 2.], [ 3., 3., 3., 3., 3.]]) Coordinates: * a (a) int64 0 1 2 * b (b) int64 1 2 3 4 5 Or, if `dim` is already a dimension in `data`, then any new coordinate values in `new_coords` that are not yet in `data[dim]` will be added, and the values corresponding to those new coordinates will be `fill_value`. >>> da = xr.DataArray([1, 2, 3], dims=""a"", coords=[[0, 1, 2]]) >>> expand_dimensions(da, a=[1, 2, 3, 4, 5]) array([ 1., 2., 3., 0., 0., 0.]) Coordinates: * a (a) int64 0 1 2 3 4 5 Args: data (xarray.DataArray): Data that needs dimensions expanded. fill_value (scalar, xarray.DataArray, optional): If expanding new coords this is the value of the new datum. Defaults to `np.nan`. **new_coords (list[int | str]): The keywords are arbitrary dimensions and the values are coordinates of those dimensions that the data will include after it has been expanded. Returns: xarray.DataArray: Data that had its dimensions expanded to include the new coordinates. """""" ordered_coord_dict = OrderedDict(new_coords) shape_da = xr.DataArray( np.zeros(list(map(len, ordered_coord_dict.values()))), coords=ordered_coord_dict, dims=ordered_coord_dict.keys()) expanded_data = xr.broadcast(data, shape_da)[0].fillna(fill_value) return expanded_data ``` Here's an example of broadcasting data across a new dimension: ``` >>> coords = {""b"": range(5), ""c"": range(3)} >>> da = xr.DataArray(np.ones([5, 3]), coords=coords, dims=list(coords.keys())) >>> expand_dimensions(da, a=[0, 1, 2]) array([[[1., 1., 1.], [1., 1., 1.], [1., 1., 1.]], [[1., 1., 1.], [1., 1., 1.], [1., 1., 1.]], [[1., 1., 1.], [1., 1., 1.], [1., 1., 1.]], [[1., 1., 1.], [1., 1., 1.], [1., 1., 1.]], [[1., 1., 1.], [1., 1., 1.], [1., 1., 1.]]]) Coordinates: * b (b) int64 0 1 2 3 4 * c (c) int64 0 1 2 * a (a) int64 0 1 2 ``` Here's an example of expanding an existing dimension to include new coordinates: ``` >>> expand_dimensions(da, b=[5, 6]) array([[ 1., 1., 1.], [ 1., 1., 1.], [ 1., 1., 1.], [ 1., 1., 1.], [ 1., 1., 1.], [nan, nan, nan], [nan, nan, nan]]) Coordinates: * b (b) int64 0 1 2 3 4 5 6 * c (c) int64 0 1 2 ``` #### Final Note If no one else is already working on this, and if it seems like a useful addition to XArray, then I would more than happy to work on this. Please let me know. Thank you, Martin","{""url"": ""https://api.github.com/repos/pydata/xarray/issues/2710/reactions"", ""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,completed,13221727,issue 403367810,MDU6SXNzdWU0MDMzNjc4MTA=,2713,xarray.DataArray.mean() can't calculate weighted mean,10720577,closed,0,,,2,2019-01-25T23:08:01Z,2019-01-26T02:50:07Z,2019-01-26T02:49:53Z,CONTRIBUTOR,,,,"#### Code Sample, a copy-pastable example if possible Currently `xarray.DataArray.mean()` and `xarray.Dataset.mean()` cannot calculate weighted means. I think it would be useful if it had a similar API to `numpy.average`: https://docs.scipy.org/doc/numpy-1.15.0/reference/generated/numpy.average.html Here is the code I currently use to get the weighted mean of an `xarray.DataArray`. ```python def weighted_mean(data_da, dim, weights): r""""""Computes the weighted mean. We can only do the actual weighted mean over the dimensions that ``data_da`` and ``weights`` share, so for dimensions in ``dim`` that aren't included in ``weights`` we must take the unweighted mean. This functions skips NaNs, i.e. Data points that are NaN have corresponding NaN weights. Args: data_da (xarray.DataArray): Data to compute a weighted mean for. dim (str | list[str]): dimension(s) of the dataarray to reduce over weights (xarray.DataArray): a 1-D dataarray the same length as the weighted dim, with dimension name equal to that of the weighted dim. Must be nonnegative. Returns: (xarray.DataArray): The mean over the given dimension. So it will contain all dimensions of the input that are not in ``dim``. Raises: (IndexError): If ``weights.dims`` is not a subset of ``dim``. (ValueError): If ``weights`` has values that are negative or infinite. """""" if isinstance(dim, str): dim = [dim] else: dim = list(dim) if not set(weights.dims) <= set(dim): dim_err_msg = ( ""`weights.dims` must be a subset of `dim`. {} are dimensions in "" ""`weights`, but not in `dim`."" ).format(set(weights.dims) - set(dim)) raise IndexError(dim_err_msg) else: pass # `weights.dims` is a subset of `dim` if (weights < 0).any() or xr.ufuncs.isinf(weights).any(): negative_weight_err_msg = ""Weight must be nonnegative and finite"" raise ValueError(negative_weight_err_msg) else: pass # `weights` are nonnegative weight_dims = [ weight_dim for weight_dim in dim if weight_dim in weights.dims ] if np.isnan(data_da).any(): expanded_weights, _ = xr.broadcast(weights, data_da) weights_with_nans = expanded_weights.where(~np.isnan(data_da)) else: weights_with_nans = weights mean_da = ((data_da * weights_with_nans).sum(weight_dims, skipna=True) / weights_with_nans.sum(weight_dims)) other_dims = list(set(dim) - set(weight_dims)) return mean_da.mean(other_dims, skipna=True) ``` If no one is already working on this and if it seems useful, then I would be happy to work on this. Thank you, Martin ","{""url"": ""https://api.github.com/repos/pydata/xarray/issues/2713/reactions"", ""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,completed,13221727,issue