home / github / issues

Menu
  • GraphQL API
  • Search all tables

issues: 170779798

This data as json

id node_id number title user state locked assignee milestone comments created_at updated_at closed_at author_association active_lock_reason draft pull_request body reactions performed_via_github_app state_reason repo type
170779798 MDExOlB1bGxSZXF1ZXN0ODEwNTQ4MTQ= 964 New function for applying vectorized functions for unlabeled arrays to xarray objects 1217238 closed 0     33 2016-08-12T01:00:06Z 2018-12-13T16:45:07Z 2017-01-06T00:36:05Z MEMBER   0 pydata/xarray/pulls/964

This PR creates new public facing function xarray.apply_ufunc which handles all the logic of applying numpy generalized universal functions to xarray's labelled arrays, including automatic alignment, merging coordinates, broadcasting and reapplying labels to the result.

Note that although we use the gufunc interface here, this works for far more than gufuncs. Any function that handles broadcasting in the usual numpy way will do. See below for examples.

Now that this logic is all in one place, we will even be able to (in a follow-up PR) include hooks for setting output array names and attributes based on input (e.g., to allow third party libraries to add unit support #525).

Xref #770

Examples

Calculate the vector magnitude of two arguments:

def magnitude(a, b): func = lambda x, y: np.sqrt(x ** 2 + y ** 2) return xr.apply_func(func, a, b)

Compute the mean (.mean)::

def mean(obj, dim): # note: apply_ufunc always moves core dimensions to the end sig = ([(dim,)], [()]) kwargs = {'axis': -1} return xr.apply_ufunc(np.mean, obj, signature=sig, kwargs=kwargs)

Inner product over a specific dimension::

``` def gufunc_inner(x, y): result = np.matmul(x[..., np.newaxis, :], y[..., :, np.newaxis]) return result[..., 0, 0]

def inner_product(a, b, dim): sig = ([(dim,), (dim,)], [()]) return xr.apply_ufunc(gufunc_inner, a, b, signature=sig) ```

Stack objects along a new dimension (like xr.concat)::

def stack(objects, dim, new_coord): sig = ([()] * len(objects), [(dim,)]) new_coords = [{dim: new_coord}] func = lambda *x: np.stack(x, axis=-1) return xr.apply_ufunc(func, *objects, signature=sig, new_coords=new_coords, dataset_fill_value=np.nan)

Singular value decomposition:

``` def dim_shape(obj, dim): # TODO: make this unnecessary, see #921 try: return obj.dims.index(dim) except AttributeError: return obj.dims[dim]

def svd(obj, dim0, dim1, new_dim='singular_values'): sig = ([(dim0, dim1)], [(dim0, new_dim), (new_dim,), (new_dim, dim1)]) K = min(dim_shape(obj, dim0), dim_shape(obj, dim1)) new_coords = [{new_dim: np.arange(K)}] * 3 return xr.apply_ufunc(np.linalg.svd, obj, signature=sig, new_coords=new_coords, kwargs={'full_matrices': False}) ```

Signature/Docstring

``` apply_ufunc(func, *args, signature=None, join='inner', new_coords=None, exclude_dims=frozenset(), dataset_fill_value=None, kwargs=None, dask_array='forbidden')

Apply a vectorized function for unlabeled arrays to xarray objects.

The input arguments will be handled using xarray's standard rules for labeled computation, including alignment, broadcasting, looping over GroupBy/Dataset variables, and merging of coordinates.

Parameters

func : callable Function to call like func(*args, **kwargs) on unlabeled arrays (.data). If multiple arguments with non-matching dimensions are supplied, this function is expected to vectorize (broadcast) over axes of positional arguments in the style of NumPy universal functions [1]_. *args : Dataset, DataArray, GroupBy, Variable, numpy/dask arrays or scalars Mix of labeled and/or unlabeled arrays to which to apply the function. signature : string or triply nested sequence, optional Object indicating core dimensions that should not be broadcast on the input and outputs arguments. If omitted, inputs will be broadcast to share all dimensions in common before calling func on their values, and the output of func will be assumed to be a single array with the same shape as the inputs.

Two forms of signatures are accepted:
(a) A signature string of the form used by NumPy's generalized
    universal functions [2]_, e.g., '(),(time)->()' indicating a
    function that accepts two arguments and returns a single argument,
    on which all dimensions should be broadcast except 'time' on the
    second argument.
(a) A triply nested sequence providing lists of core dimensions for
    each variable, for both input and output, e.g.,
    ``([(), ('time',)], [()])``.

Core dimensions are automatically moved to the last axes of any input
variables, which facilitates using NumPy style generalized ufuncs (see
the examples below).

Unlike the NumPy gufunc signature spec, the names of all dimensions
provided in signatures must be the names of actual dimensions on the
xarray objects.

join : {'outer', 'inner', 'left', 'right'}, optional Method for joining the indexes of the passed objects along each dimension, and the variables of Dataset objects with mismatched data variables: - 'outer': use the union of object indexes - 'inner': use the intersection of object indexes - 'left': use indexes from the first object with each dimension - 'right': use indexes from the last object with each dimension new_coords : list of dict-like, optional New coordinates to include on each output variable. Any core dimensions on outputs not found on the inputs must be provided here. exclude_dims : set, optional Dimensions to exclude from alignment and broadcasting. Any inputs coordinates along these dimensions will be dropped. If you include these dimensions on any outputs, you must explicit set them in new_coords. Each excluded dimension must be a core dimension in the function signature. dataset_fill_value : optional Value used in place of missing variables on Dataset inputs when the datasets do not share the exact same data_vars. Only relevant if join != 'inner'. kwargs: dict, optional Optional keyword arguments passed directly on to call func. dask_array: 'forbidden' or 'allowed', optional Whether or not to allow applying the ufunc to objects containing lazy data in the form of dask arrays. By default, this is forbidden, to avoid implicitly converting lazy data.

Returns

Single value or tuple of Dataset, DataArray, Variable, dask.array.Array or numpy.ndarray, the first type on that list to appear on an input. ```

{
    "url": "https://api.github.com/repos/pydata/xarray/issues/964/reactions",
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
    13221727 pull

Links from other tables

  • 0 rows from issues_id in issues_labels
  • 33 rows from issue in issue_comments
Powered by Datasette · Queries took 0.591ms · About: xarray-datasette