html_url,issue_url,id,node_id,user,created_at,updated_at,author_association,body,reactions,performed_via_github_app,issue
https://github.com/pydata/xarray/issues/7045#issuecomment-1249910951,https://api.github.com/repos/pydata/xarray/issues/7045,1249910951,IC_kwDOAMm_X85KgCCn,1217238,2022-09-16T22:26:36Z,2022-09-16T22:26:36Z,MEMBER,"As a concrete example, suppose we have two datasets:
1. Hourly predictions for 10 days
2. Daily observations for a month.
```python
import numpy as np
import pandas as pd
import xarray
predictions = xarray.DataArray(
np.random.RandomState(0).randn(24*10),
{'time': pd.date_range('2022-01-01', '2022-01-11', freq='1h', closed='left')},
)
observations = xarray.DataArray(
np.random.RandomState(1).randn(31),
{'time': pd.date_range('2022-01-01', '2022-01-31', freq='24h')},
)
```
Today, if you compare these datasets, they automatically align:
```
>>> predictions - observations
array([ 0.13970698, 2.88151104, -1.0857261 , 2.21236931, -0.85490761,
2.67796423, 0.63833301, 1.94923669, -0.35832191, 0.23234996])
Coordinates:
* time (time) datetime64[ns] 2022-01-01 2022-01-02 ... 2022-01-10
```
With this proposed change, you would get an error, e.g., something like:
```
>>> predictions - observations
ValueError: xarray objects are not aligned along dimension 'time':
array(['2022-01-01T00:00:00.000000000', '2022-01-02T00:00:00.000000000',
'2022-01-03T00:00:00.000000000', '2022-01-04T00:00:00.000000000',
'2022-01-05T00:00:00.000000000', '2022-01-06T00:00:00.000000000',
'2022-01-07T00:00:00.000000000', '2022-01-08T00:00:00.000000000',
'2022-01-09T00:00:00.000000000', '2022-01-10T00:00:00.000000000',
'2022-01-11T00:00:00.000000000', '2022-01-12T00:00:00.000000000',
'2022-01-13T00:00:00.000000000', '2022-01-14T00:00:00.000000000',
'2022-01-15T00:00:00.000000000', '2022-01-16T00:00:00.000000000',
'2022-01-17T00:00:00.000000000', '2022-01-18T00:00:00.000000000',
'2022-01-19T00:00:00.000000000', '2022-01-20T00:00:00.000000000',
'2022-01-21T00:00:00.000000000', '2022-01-22T00:00:00.000000000',
'2022-01-23T00:00:00.000000000', '2022-01-24T00:00:00.000000000',
'2022-01-25T00:00:00.000000000', '2022-01-26T00:00:00.000000000',
'2022-01-27T00:00:00.000000000', '2022-01-28T00:00:00.000000000',
'2022-01-29T00:00:00.000000000', '2022-01-30T00:00:00.000000000',
'2022-01-31T00:00:00.000000000'], dtype='datetime64[ns]')
vs
array(['2022-01-01T00:00:00.000000000', '2022-01-01T01:00:00.000000000',
'2022-01-01T02:00:00.000000000', ..., '2022-01-10T21:00:00.000000000',
'2022-01-10T22:00:00.000000000', '2022-01-10T23:00:00.000000000'],
dtype='datetime64[ns]')
```
Instead, you would need to manually align these objects, e.g., with `xarray.align`, `reindex_like()` or `interp_like()`, e.g.,
```
>>> predictions, observations = xarray.align(predictions, observations)
```
or
```
>>> observations = observations.reindex_like(predictions)
```
or
```
>>> predictions = predictions.interp_like(observations)
```
To (partially) simulate the effect of this change on a codebase today, you could write `xarray.set_options(arithmetic_join='exact')` -- but presmably it would also make sense to change Xarray's other alignment code (e.g., in `concat` and `merge`).","{""total_count"": 1, ""+1"": 1, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,1376109308
https://github.com/pydata/xarray/issues/7045#issuecomment-1249601076,https://api.github.com/repos/pydata/xarray/issues/7045,1249601076,IC_kwDOAMm_X85Ke2Y0,1217238,2022-09-16T17:16:52Z,2022-09-16T17:18:38Z,MEMBER,"> IMO we could first align (hah) these choices to be the same:
>
> > the exact mode of automatic alignment (outer vs inner vs left join) depends on the specific operation.
The problem is that user expectations are actually rather different for different options:
- With data movement operations like `xarray.merge`, you expect to keep around all existing data -- so you want an outer join.
- With inplace operations that modify an existing Dataset, e.g., by adding new variables, you don't expect the existing coordinates to change -- so you want a left join.
- With computate based operations (like arithmatic), you don't have an expectation that all existing data is unmodified, so keeping around a bunch of NaN values felt very wasteful -- hence the inner join.
> What do you think of making the default FloatIndex use a reasonable (hard to define!) `rtol` for comparisons?
This would definitely be a step forward! However, it's a tricky nut to crack. We would both need a heuristic for defining `rtol` (some fraction of coordinate spacing?) and a method for deciding what the resulting coordinates should be (use values from the first object?).
Even then, automatic alignment is often problematic, e.g., imagine cases where a coordinate is defined in separate units.","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,1376109308