home / github / issues

Menu
  • Search all tables
  • GraphQL API

issues: 2276408691

This data as json

id node_id number title user state locked assignee milestone comments created_at updated_at closed_at author_association active_lock_reason draft pull_request body reactions performed_via_github_app state_reason repo type
2276408691 I_kwDOAMm_X86Hrz1z 8995 Why does xr.apply_ufunc support numpy/dask.arrays? 35968931 open 0     0 2024-05-02T20:18:41Z 2024-05-03T22:03:43Z   MEMBER      

What is your issue?

@keewis pointed out that it's weird that xarray.apply_ufunc supports passing numpy/dask arrays directly, and I'm inclined to agree. I don't understand why we do, and think we should consider removing that feature.

Two arguments in favour of removing it:

1) It exposes users to transposition errors

Consider this example:

```python In [1]: import xarray as xr

In [2]: import numpy as np

In [3]: arr = np.arange(12).reshape(3, 4)

In [4]: def mean(obj, dim): ...: # note: apply always moves core dimensions to the end ...: return xr.apply_ufunc( ...: np.mean, obj, input_core_dims=[[dim]], kwargs={"axis": -1} ...: ) ...:

In [5]: mean(arr, dim='time') Out[5]: array([1.5, 5.5, 9.5])

In [6]: mean(arr.T, dim='time') Out[6]: array([4., 5., 6., 7.]) ```

Transposing the input leads to a different result, with the value of the dim kwarg effectively ignored. This kind of error is what xarray code is supposed to prevent by design.

2) There is an alternative input pattern that doesn't require accepting bare arrays

Instead, any numpy/dask array can just be wrapped up into an xarray Variable/NamedArray before passing it to apply_ufunc.

```python In [7]: from xarray.core.variable import Variable

In [8]: var = Variable(data=arr, dims=['time', 'space'])

In [9]: mean(var, dim='time') Out[9]: <xarray.Variable (space: 4)> Size: 32B array([4., 5., 6., 7.])

In [10]: mean(var.T, dim='time') Out[10]: <xarray.Variable (space: 4)> Size: 32B array([4., 5., 6., 7.]) ```

This now guards against the transposition error, and puts the onus on the user to be clear about which axes of their array correspond to which dimension.

With Variable/NamedArray as public API, this latter pattern can handle every case that passing bare arrays in could.

I suggest we deprecate accepting bare arrays in favour of having users wrap them in Variable/NamedArray/DataArray objects instead.

(Note 1: We also accept raw scalars, but this doesn't expose anyone to transposition errors.)

(Note 2: In a quick scan of the apply_ufunc docstring, the docs on it in computation.rst, and the extensive guide that @dcherian wrote in the xarray tutorial repository, I can't see any examples that actually pass bare arrays to apply_ufunc.)

{
    "url": "https://api.github.com/repos/pydata/xarray/issues/8995/reactions",
    "total_count": 2,
    "+1": 2,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
    13221727 issue

Links from other tables

  • 2 rows from issues_id in issues_labels
  • 0 rows from issue in issue_comments
Powered by Datasette · Queries took 234.927ms · About: xarray-datasette