issues: 37845163

This data as json

id	node_id	number	title	user	state	locked	assignee	milestone	comments	created_at	updated_at	closed_at	author_association	active_lock_reason	draft	pull_request	body	reactions	performed_via_github_app	state_reason	repo	type
37845163	MDExOlB1bGxSZXF1ZXN0MTgzODUxODA=	184	WIP: Automatic label alignment for mathematical operations	1217238	closed	0			1	2014-07-15T01:57:18Z	2014-08-21T05:44:30Z	2014-08-21T05:44:30Z	MEMBER		0	pydata/xarray/pulls/184	This still need a bit of cleanup (note the failing test), but there is an interesting design decision that came up: How should we handle alignment for in-place operations when the operation would result in missing values that cannot be represented by the existing data type? For example, what should `x` be after the following? `python x = DataArray([1, 2], coordinates=[['a', 'b']], dimensions=['foo']) y = DataArray([3], coordinates=[['b']], dimensions=['foo']) x += y` If we do automatic alignment like pandas, in-place operations should not change the coordinates of the object to which the operation is being applied. Thus, `y` should be equivalent to: `python y_prime = DataArray([np.nan, 3], coordinates=[['a', 'b']], dimensions=['foo'])` Here arises the problem: `x` has `dtype=int`, so it cannot represent `NaN`. If I run this example using the current version of this patch, I end up with: `In [5]: x Out[5]: <xray.DataArray (foo: 2)> array([-9223372036854775808, 5]) Coordinates: foo: Index([u'a', u'b'], dtype='object') Attributes: Empty` There are several options here: 1. Don't actually do in-place operations on the underlying ndarrays: `x += y` should translate under the hood to `x = x + y`, which sidesteps the issue, because `x + y` results in a new floating point array. This is what pandas does. 2. Do the operation in-place on the ndarray like numpy -- it's the user's problem if they try to add `np.nan` in-place to an integer. 3. Do the operation in-place, but raise a warning or error if the right hand side expression ends up including any missing values. Interestingly, this is what numpy does, but only for 0-dimensional arrays: ``` In [3]: x = np.array(0) In [4]: x += np.nan /Users/shoyer/miniconda/envs/tcc-climatology/bin/ipython:1: RuntimeWarning: invalid value encountered in add #!/Users/shoyer/miniconda/envs/tcc-climatology/python.app/Contents/MacOS/python ``` Option 1 has negative performance implications for all in-place array operations (they would be no faster than the non-in-place versions), and might also complicate the hypothetical future feature of datasets linked on disk (but we might also just disallow in-place operations for such arrays). Option 2 is one principled choice, but the outcome with missing values would be pretty surprising (note that in this scenario, both `x` and `y` were integer arrays). I like option 3 (with the warning), but unfortunately it has most of the negative performance implications of option 1, because we could need to make a copy of `y` to check for missing values. This could be partially alleviated by using something like `bottleneck.anynan` instead, and by the fact that we would only need to do this check if the in-place operation is adding a float to an int. Any thoughts?	{ "url": "https://api.github.com/repos/pydata/xarray/issues/184/reactions", "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }			13221727	pull

Links from other tables

2 rows from issues_id in issues_labels
1 row from issue in issue_comments