issue_comments: 586597512

This data as json

html_url	issue_url	id	node_id	user	created_at	updated_at	author_association	body	reactions	performed_via_github_app	issue
https://github.com/pydata/xarray/pull/3764#issuecomment-586597512	https://api.github.com/repos/pydata/xarray/issues/3764	586597512	MDEyOklzc3VlQ29tbWVudDU4NjU5NzUxMg==	6628425	2020-02-15T14:51:05Z	2020-02-15T14:51:05Z	MEMBER	This is a little bit trickier than I originally anticipated. For indexing with dates very distant from the dates in the index, I'm still running into an issue at this step in `pandas.Index._get_nearest_indexer`: `left_distances = np.abs(self[left_indexer] - target)` Consider the following the example: ``` In [1]: import numpy as np; import pandas as pd; import xarray as xr In [2]: import cftime In [3]: times = xr.cftime_range('0001', periods=4) In [4]: da = xr.DataArray(range(4), coords=[('time', times)]) In [5]: da.sel(time=cftime.DatetimeGregorian(2000, 1, 1), method='nearest') ``` In this example `self[left_indexer]` is `CFTimeIndex([0001-01-04 00:00:00], dtype='object', name='time')`, while `target` is `Index([2000-01-01 00:00:00], dtype='object')`. The distance between these dates is greater than 292 years, so it cannot be represented in a `TimedeltaIndex`. One could argue that we could fall back on using a generic `Index` of dtype object to store the `datetime.timedelta` object produced in this case: ``` In [6]: left_index = xr.CFTimeIndex([cftime.DatetimeGregorian(1, 1, 4)]) In [7]: target = pd.Index([cftime.DatetimeGregorian(2000, 1, 1)]) In [8]: difference = left_index - target In [9]: difference Out[9]: Index([-730118 days, 0:00:00], dtype='object') `` A problem occurs though when we try to take the absolute value of this index. Pandas (I think reasonably so) tries to detect the datatype of the result and construct a new index. In doing so it tries to create aTimedeltaIndex`, but cannot, because the`timedelta` inside is too large: Result of np.abs(difference) ``` In [10]: np.abs(difference) --------------------------------------------------------------------------- TypeError Traceback (most recent call last) ~/Software/pandas/pandas/_libs/tslibs/timedeltas.pyx in pandas._libs.tslibs.timedeltas.array_to_timedelta64() TypeError: Expected unicode, got datetime.timedelta During handling of the above exception, another exception occurred: OverflowError Traceback (most recent call last) <ipython-input-17-95776624315c> in <module> ----> 1 np.abs(difference) ~/Software/pandas/pandas/core/indexes/base.py in __array_wrap__(self, result, context) 628 629 attrs = self._get_attributes_dict() --> 630 return Index(result, attrs) 631 632 @cache_readonly ~/Software/pandas/pandas/core/indexes/base.py in __new__(cls, data, dtype, copy, name, tupleize_cols, kwargs) 412 413 if dtype is None: --> 414 new_data, new_dtype = _maybe_cast_data_without_dtype(subarr) 415 if new_dtype is not None: 416 return cls( ~/Software/pandas/pandas/core/indexes/base.py in _maybe_cast_data_without_dtype(subarr) 5711 5712 elif inferred.startswith("timedelta"): -> 5713 data = TimedeltaArray._from_sequence(subarr, copy=False) 5714 return data, data.dtype 5715 elif inferred == "period": ~/Software/pandas/pandas/core/arrays/timedeltas.py in _from_sequence(cls, data, dtype, copy, freq, unit) 212 freq, freq_infer = dtl.maybe_infer_freq(freq) 213 --> 214 data, inferred_freq = sequence_to_td64ns(data, copy=copy, unit=unit) 215 freq, freq_infer = dtl.validate_inferred_freq(freq, inferred_freq, freq_infer) 216 ~/Software/pandas/pandas/core/arrays/timedeltas.py in sequence_to_td64ns(data, copy, unit, errors) 938 if is_object_dtype(data.dtype) or is_string_dtype(data.dtype): 939 # no need to make a copy, need to convert if string-dtyped --> 940 data = objects_to_td64ns(data, unit=unit, errors=errors) 941 copy = False 942 ~/Software/pandas/pandas/core/arrays/timedeltas.py in objects_to_td64ns(data, unit, errors) 1047 values = np.array(data, dtype=np.object_, copy=False) 1048 -> 1049 result = array_to_timedelta64(values, unit=unit, errors=errors) 1050 return result.view("timedelta64[ns]") 1051 ~/Software/pandas/pandas/_libs/tslibs/timedeltas.pyx in pandas._libs.tslibs.timedeltas.array_to_timedelta64() ~/Software/pandas/pandas/_libs/tslibs/timedeltas.pyx in pandas._libs.tslibs.timedeltas.convert_to_timedelta64() ~/Software/pandas/pandas/_libs/tslibs/timedeltas.pyx in pandas._libs.tslibs.timedeltas.delta_to_nanoseconds() OverflowError: Python int too large to convert to C long ``` So I'm a little bit back to the drawing board regarding a solution for this. Part of me is tempted to write an overriding version of `_get_nearest_indexer` for `CFTimeIndex` that basically restores the old behavior; I'm not sure how dangerous that would be to rely on though.	{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }		563202971