html_url,issue_url,id,node_id,user,created_at,updated_at,author_association,body,reactions,performed_via_github_app,issue https://github.com/pydata/xarray/issues/2217#issuecomment-733254878,https://api.github.com/repos/pydata/xarray/issues/2217,733254878,MDEyOklzc3VlQ29tbWVudDczMzI1NDg3OA==,2448579,2020-11-24T21:54:14Z,2020-11-24T21:54:14Z,MEMBER,reopening since we have a PR to fix this properly.,"{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,329575874 https://github.com/pydata/xarray/issues/2217#issuecomment-540636805,https://api.github.com/repos/pydata/xarray/issues/2217,540636805,MDEyOklzc3VlQ29tbWVudDU0MDYzNjgwNQ==,2448579,2019-10-10T15:18:28Z,2019-10-10T15:18:28Z,MEMBER,"Yes on xarray>=0.13.0, `xr.open_mfdataset(..., join=""override"")` assuming that all files are on the same coordinate system.","{""total_count"": 1, ""+1"": 1, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,329575874 https://github.com/pydata/xarray/issues/2217#issuecomment-400080478,https://api.github.com/repos/pydata/xarray/issues/2217,400080478,MDEyOklzc3VlQ29tbWVudDQwMDA4MDQ3OA==,1217238,2018-06-25T20:14:00Z,2018-06-25T20:14:00Z,MEMBER,"Both of these sounds reasonable to me, but APIs for pandas are really best discussed in a pandas issue. I'm happy to chime in over there, but I haven't been an active pandas dev recently. On Mon, Jun 25, 2018 at 2:07 PM Benjamin Root wrote: > Do we want to dive straight to that? Or, would it make more sense to first > submit some PRs piping the support for a tolerance kwarg through more of > the API? Or perhaps we should propose that a ""tolerance"" attribute should > be an optional attribute that methods like get_indexer() and such could > always check for? Not being a pandas dev, I am not sure how piecemeal we > should approach this. > > In addition, we are likely going to have to implement a decent chunk of > code ourselves for compatibility's sake, I think. > > — > You are receiving this because you were mentioned. > Reply to this email directly, view it on GitHub > , or mute > the thread > > . > ","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,329575874 https://github.com/pydata/xarray/issues/2217#issuecomment-399615463,https://api.github.com/repos/pydata/xarray/issues/2217,399615463,MDEyOklzc3VlQ29tbWVudDM5OTYxNTQ2Mw==,1217238,2018-06-23T00:26:19Z,2018-06-23T00:26:19Z,MEMBER,"OK, I think I'm convinced. Now it's probably a good time to go back to the pandas issues (or open a new one) with a proposal to add tolerance to Float64Index. On Fri, Jun 22, 2018 at 4:56 PM Benjamin Root wrote: > I am not concerned about the non-commutativeness of the indexer itself. > There is no way around that. At some point, you have to choose values, > whether it is done by an indexer or done by some particular set operation. > > As for the different sizes, that happens when the tolerance is greater than > half the smallest delta. I figure a final implementation would enforce such > a constraint on the tolerance. > > On Fri, Jun 22, 2018 at 5:56 PM, Stephan Hoyer > wrote: > > > @WeatherGod One problem with your > > definition of tolerance is that it isn't commutative, even if both > indexes > > have the same tolerance: > > > > a = ImpreciseIndex([0.1, 0.2, 0.3, 0.4]) > > a.tolerance = 0.1 > > b = ImpreciseIndex([0.301, 0.401, 0.501, 0.601]) > > b.tolerance = 0.1print(a.union(b)) # ImpreciseIndex([0.1, 0.2, 0.3, 0.4, > 0.501, 0.601], dtype='float64')print(b.union(a)) # ImpreciseIndex([0.1, > 0.2, 0.301, 0.401, 0.501, 0.601], dtype='float64') > > > > If you try a little harder, you could even have cases where the result > has > > a different size, e.g., > > > > a = ImpreciseIndex([1, 2, 3]) > > a.tolerance = 0.5 > > b = ImpreciseIndex([1, 1.9, 2.1, 3]) > > b.tolerance = 0.5print(a.union(b)) # ImpreciseIndex([1.0, 2.0, 3.0], > dtype='float64')print(b.union(a)) # ImpreciseIndex([1.0, 1.9, 2.1, 3.0], > dtype='float64') > > > > Maybe these aren't really problems in practice, but it's at least a > little > > strange/surprising. > > > > — > > You are receiving this because you were mentioned. > > Reply to this email directly, view it on GitHub > > , > or mute > > the thread > > < > https://github.com/notifications/unsubscribe-auth/AARy-BUsm4Pcs-LC7s1iNAhPvCVRrGtwks5t_WgDgaJpZM4UbV3q > > > > . > > > > — > You are receiving this because you were mentioned. > Reply to this email directly, view it on GitHub > , or mute > the thread > > . > ","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,329575874 https://github.com/pydata/xarray/issues/2217#issuecomment-399593224,https://api.github.com/repos/pydata/xarray/issues/2217,399593224,MDEyOklzc3VlQ29tbWVudDM5OTU5MzIyNA==,1217238,2018-06-22T21:56:17Z,2018-06-22T21:56:17Z,MEMBER,"@WeatherGod One problem with your definition of tolerance is that it isn't commutative, even if both indexes have the same tolerance: ```python a = ImpreciseIndex([0.1, 0.2, 0.3, 0.4]) a.tolerance = 0.1 b = ImpreciseIndex([0.301, 0.401, 0.501, 0.601]) b.tolerance = 0.1 print(a.union(b)) # ImpreciseIndex([0.1, 0.2, 0.3, 0.4, 0.501, 0.601], dtype='float64') print(b.union(a)) # ImpreciseIndex([0.1, 0.2, 0.301, 0.401, 0.501, 0.601], dtype='float64') ``` If you try a little harder, you could even have cases where the result has a different size, e.g., ```python a = ImpreciseIndex([1, 2, 3]) a.tolerance = 0.5 b = ImpreciseIndex([1, 1.9, 2.1, 3]) b.tolerance = 0.5 print(a.union(b)) # ImpreciseIndex([1.0, 2.0, 3.0], dtype='float64') print(b.union(a)) # ImpreciseIndex([1.0, 1.9, 2.1, 3.0], dtype='float64') ``` Maybe these aren't really problems in practice, but it's at least a little strange/surprising.","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,329575874 https://github.com/pydata/xarray/issues/2217#issuecomment-399540641,https://api.github.com/repos/pydata/xarray/issues/2217,399540641,MDEyOklzc3VlQ29tbWVudDM5OTU0MDY0MQ==,1217238,2018-06-22T18:39:28Z,2018-06-22T18:39:28Z,MEMBER,"Again, I think the first big challenge here is writing fast approximate union/intersection algorithms. Then we can figure out how to wire them into the pandas/xarray API :). On Fri, Jun 22, 2018 at 10:42 AM Benjamin Root wrote: > Ok, I see how you implemented it for pandas's reindex. You essentially > inserted an inexact filter within .get_indexer(). And the intersection() > and union() uses these methods, so, in theory, one could pipe a tolerance > argument through them (as well as for the other set operations). The work > needs to be expanded a bit, though, as get_indexer_non_unique() needs the > tolerance parameter, too, I think. > > For xarray, though, I think we can work around backwards compatibility by > having Dataset hold specialized subclasses of Index for floating-point data > types that would have the needed changes to the Index class. We can have > this specialized class have some default tolerance (say > 100*finfo(dtype).resolution?), and it would have its methods use the stored > tolerance by default, so it should be completely transparent to the > end-user (hopefully). This way, xr.open_mfdataset() would ""just work"". > > — > You are receiving this because you were mentioned. > Reply to this email directly, view it on GitHub > , or mute > the thread > > . > ","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,329575874 https://github.com/pydata/xarray/issues/2217#issuecomment-399317060,https://api.github.com/repos/pydata/xarray/issues/2217,399317060,MDEyOklzc3VlQ29tbWVudDM5OTMxNzA2MA==,1217238,2018-06-22T04:27:30Z,2018-06-22T04:27:30Z,MEMBER,See https://github.com/pandas-dev/pandas/issues/9817 and https://github.com/pandas-dev/pandas/issues/9530 for the relevant pandas issues.,"{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,329575874 https://github.com/pydata/xarray/issues/2217#issuecomment-399293141,https://api.github.com/repos/pydata/xarray/issues/2217,399293141,MDEyOklzc3VlQ29tbWVudDM5OTI5MzE0MQ==,1217238,2018-06-22T01:32:56Z,2018-06-22T01:32:56Z,MEMBER,"I think a tolerance argument for set-methods like `Index.union` would be an easier sell than an epsilon argument for the Index construction. You'd still need to figure out the right algorithmic approach, though.","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,329575874 https://github.com/pydata/xarray/issues/2217#issuecomment-399258602,https://api.github.com/repos/pydata/xarray/issues/2217,399258602,MDEyOklzc3VlQ29tbWVudDM5OTI1ODYwMg==,1217238,2018-06-21T22:07:14Z,2018-06-21T22:07:14Z,MEMBER,"> To be clear, my use-case would not be solved by join='override' (isn't that just join='left'?). I have moving nests of coordinates that can have some floating-point noise in them, but are otherwise identical. `join='left'' will *reindex* all arguments to match the coordinates of the first object. In practice, that means that if coordinates differ by floating point noise, the second object would end up converted to all NaNs. `join='override'` would just *relabel* coordinates, assuming that the shapes match. The data wouldn't change at all. I guess another way to do this would be to include `method` and `tolerance` arguments from `reindex` on align, and only allow them when `join='left'` or `join='right'`. But this would be a little trickier to pass on through other functions like `open_mfdataset()`.","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,329575874 https://github.com/pydata/xarray/issues/2217#issuecomment-395117968,https://api.github.com/repos/pydata/xarray/issues/2217,395117968,MDEyOklzc3VlQ29tbWVudDM5NTExNzk2OA==,1217238,2018-06-06T15:49:09Z,2018-06-06T15:49:09Z,MEMBER,"> For example `xr.align(da1, da2, join='override')` This would just check that the shapes of the different coordinates match and then replace da2's coordinates with those from da1. I like this idea! This would be certainly be much easier to implement than general purpose approximate alignment.","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,329575874 https://github.com/pydata/xarray/issues/2217#issuecomment-395065697,https://api.github.com/repos/pydata/xarray/issues/2217,395065697,MDEyOklzc3VlQ29tbWVudDM5NTA2NTY5Nw==,1197350,2018-06-06T13:20:20Z,2018-06-06T13:20:34Z,MEMBER,"An alternative approach to fixing this issue would be the long-discussed idea of a ""fast path"" for open_mfdataset (#1823). In this case, @naomi-henderson knows a-priori that the coordinates for these files should be the same, numerical noise notwithstanding. There should be a way to just skip the alignment check completely and override the coordinates with the values from the first file. For example ```python xr.align(da1, da2, join='override') ``` This would just check that the shapes of the different coordinates match and then replace `da2`'s coordinates with those from `da1`. ","{""total_count"": 8, ""+1"": 7, ""-1"": 0, ""laugh"": 0, ""hooray"": 1, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,329575874 https://github.com/pydata/xarray/issues/2217#issuecomment-394912948,https://api.github.com/repos/pydata/xarray/issues/2217,394912948,MDEyOklzc3VlQ29tbWVudDM5NDkxMjk0OA==,1217238,2018-06-06T01:43:33Z,2018-06-06T01:46:59Z,MEMBER,"I agree that this would be useful. One option that works currently would be to determine the proper grid (e.g., from one file) and then use the `preprocess` argument of `open_mfdataset` to `reindex()` each dataset to the desired grid. To do this systematically in xarray, we would want to update `xarray.align` to be capable of approximate alignment. This would in turn require approximate versions of `pandas.Index.union` (for `join='outer'`) and `pandas.Index.intersection` (for `join='inner'`). Ideally, we would do this work upstream in pandas, and utilize it downstream in xarray. Either way, someone will need to figure out and implement the appropriate algorithm to take an approximate union of two sets of points. This could be somewhat tricky when you start to consider sets where some but not all points are within `tolerance` of each other (e.g., `{0, 1, 2, 3, 4, 5}` with `tolerance=1.5`).","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,329575874