html_url,issue_url,id,node_id,user,created_at,updated_at,author_association,body,reactions,performed_via_github_app,issue https://github.com/pydata/xarray/issues/7045#issuecomment-1249910951,https://api.github.com/repos/pydata/xarray/issues/7045,1249910951,IC_kwDOAMm_X85KgCCn,1217238,2022-09-16T22:26:36Z,2022-09-16T22:26:36Z,MEMBER,"As a concrete example, suppose we have two datasets: 1. Hourly predictions for 10 days 2. Daily observations for a month. ```python import numpy as np import pandas as pd import xarray predictions = xarray.DataArray( np.random.RandomState(0).randn(24*10), {'time': pd.date_range('2022-01-01', '2022-01-11', freq='1h', closed='left')}, ) observations = xarray.DataArray( np.random.RandomState(1).randn(31), {'time': pd.date_range('2022-01-01', '2022-01-31', freq='24h')}, ) ``` Today, if you compare these datasets, they automatically align: ``` >>> predictions - observations array([ 0.13970698, 2.88151104, -1.0857261 , 2.21236931, -0.85490761, 2.67796423, 0.63833301, 1.94923669, -0.35832191, 0.23234996]) Coordinates: * time (time) datetime64[ns] 2022-01-01 2022-01-02 ... 2022-01-10 ``` With this proposed change, you would get an error, e.g., something like: ``` >>> predictions - observations ValueError: xarray objects are not aligned along dimension 'time': array(['2022-01-01T00:00:00.000000000', '2022-01-02T00:00:00.000000000', '2022-01-03T00:00:00.000000000', '2022-01-04T00:00:00.000000000', '2022-01-05T00:00:00.000000000', '2022-01-06T00:00:00.000000000', '2022-01-07T00:00:00.000000000', '2022-01-08T00:00:00.000000000', '2022-01-09T00:00:00.000000000', '2022-01-10T00:00:00.000000000', '2022-01-11T00:00:00.000000000', '2022-01-12T00:00:00.000000000', '2022-01-13T00:00:00.000000000', '2022-01-14T00:00:00.000000000', '2022-01-15T00:00:00.000000000', '2022-01-16T00:00:00.000000000', '2022-01-17T00:00:00.000000000', '2022-01-18T00:00:00.000000000', '2022-01-19T00:00:00.000000000', '2022-01-20T00:00:00.000000000', '2022-01-21T00:00:00.000000000', '2022-01-22T00:00:00.000000000', '2022-01-23T00:00:00.000000000', '2022-01-24T00:00:00.000000000', '2022-01-25T00:00:00.000000000', '2022-01-26T00:00:00.000000000', '2022-01-27T00:00:00.000000000', '2022-01-28T00:00:00.000000000', '2022-01-29T00:00:00.000000000', '2022-01-30T00:00:00.000000000', '2022-01-31T00:00:00.000000000'], dtype='datetime64[ns]') vs array(['2022-01-01T00:00:00.000000000', '2022-01-01T01:00:00.000000000', '2022-01-01T02:00:00.000000000', ..., '2022-01-10T21:00:00.000000000', '2022-01-10T22:00:00.000000000', '2022-01-10T23:00:00.000000000'], dtype='datetime64[ns]') ``` Instead, you would need to manually align these objects, e.g., with `xarray.align`, `reindex_like()` or `interp_like()`, e.g., ``` >>> predictions, observations = xarray.align(predictions, observations) ``` or ``` >>> observations = observations.reindex_like(predictions) ``` or ``` >>> predictions = predictions.interp_like(observations) ``` To (partially) simulate the effect of this change on a codebase today, you could write `xarray.set_options(arithmetic_join='exact')` -- but presmably it would also make sense to change Xarray's other alignment code (e.g., in `concat` and `merge`).","{""total_count"": 1, ""+1"": 1, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,1376109308 https://github.com/pydata/xarray/issues/7045#issuecomment-1249601076,https://api.github.com/repos/pydata/xarray/issues/7045,1249601076,IC_kwDOAMm_X85Ke2Y0,1217238,2022-09-16T17:16:52Z,2022-09-16T17:18:38Z,MEMBER,"> IMO we could first align (hah) these choices to be the same: > > > the exact mode of automatic alignment (outer vs inner vs left join) depends on the specific operation. The problem is that user expectations are actually rather different for different options: - With data movement operations like `xarray.merge`, you expect to keep around all existing data -- so you want an outer join. - With inplace operations that modify an existing Dataset, e.g., by adding new variables, you don't expect the existing coordinates to change -- so you want a left join. - With computate based operations (like arithmatic), you don't have an expectation that all existing data is unmodified, so keeping around a bunch of NaN values felt very wasteful -- hence the inner join. > What do you think of making the default FloatIndex use a reasonable (hard to define!) `rtol` for comparisons? This would definitely be a step forward! However, it's a tricky nut to crack. We would both need a heuristic for defining `rtol` (some fraction of coordinate spacing?) and a method for deciding what the resulting coordinates should be (use values from the first object?). Even then, automatic alignment is often problematic, e.g., imagine cases where a coordinate is defined in separate units.","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,1376109308