home / github / issues

Menu
  • Search all tables
  • GraphQL API

issues: 37845163

This data as json

id node_id number title user state locked assignee milestone comments created_at updated_at closed_at author_association active_lock_reason draft pull_request body reactions performed_via_github_app state_reason repo type
37845163 MDExOlB1bGxSZXF1ZXN0MTgzODUxODA= 184 WIP: Automatic label alignment for mathematical operations 1217238 closed 0     1 2014-07-15T01:57:18Z 2014-08-21T05:44:30Z 2014-08-21T05:44:30Z MEMBER   0 pydata/xarray/pulls/184

This still need a bit of cleanup (note the failing test), but there is an interesting design decision that came up: How should we handle alignment for in-place operations when the operation would result in missing values that cannot be represented by the existing data type?

For example, what should x be after the following?

python x = DataArray([1, 2], coordinates=[['a', 'b']], dimensions=['foo']) y = DataArray([3], coordinates=[['b']], dimensions=['foo']) x += y

If we do automatic alignment like pandas, in-place operations should not change the coordinates of the object to which the operation is being applied. Thus, y should be equivalent to:

python y_prime = DataArray([np.nan, 3], coordinates=[['a', 'b']], dimensions=['foo'])

Here arises the problem: x has dtype=int, so it cannot represent NaN. If I run this example using the current version of this patch, I end up with:

In [5]: x Out[5]: <xray.DataArray (foo: 2)> array([-9223372036854775808, 5]) Coordinates: foo: Index([u'a', u'b'], dtype='object') Attributes: Empty

There are several options here: 1. Don't actually do in-place operations on the underlying ndarrays: x += y should translate under the hood to x = x + y, which sidesteps the issue, because x + y results in a new floating point array. This is what pandas does. 2. Do the operation in-place on the ndarray like numpy -- it's the user's problem if they try to add np.nan in-place to an integer. 3. Do the operation in-place, but raise a warning or error if the right hand side expression ends up including any missing values. Interestingly, this is what numpy does, but only for 0-dimensional arrays:

``` In [3]: x = np.array(0)

In [4]: x += np.nan /Users/shoyer/miniconda/envs/tcc-climatology/bin/ipython:1: RuntimeWarning: invalid value encountered in add #!/Users/shoyer/miniconda/envs/tcc-climatology/python.app/Contents/MacOS/python ```

Option 1 has negative performance implications for all in-place array operations (they would be no faster than the non-in-place versions), and might also complicate the hypothetical future feature of datasets linked on disk (but we might also just disallow in-place operations for such arrays).

Option 2 is one principled choice, but the outcome with missing values would be pretty surprising (note that in this scenario, both x and y were integer arrays).

I like option 3 (with the warning), but unfortunately it has most of the negative performance implications of option 1, because we could need to make a copy of y to check for missing values. This could be partially alleviated by using something like bottleneck.anynan instead, and by the fact that we would only need to do this check if the in-place operation is adding a float to an int.

Any thoughts?

{
    "url": "https://api.github.com/repos/pydata/xarray/issues/184/reactions",
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
    13221727 pull

Links from other tables

  • 2 rows from issues_id in issues_labels
  • 1 row from issue in issue_comments
Powered by Datasette · Queries took 0.688ms · About: xarray-datasette