home / github / issue_comments

Menu
  • GraphQL API
  • Search all tables

issue_comments: 586597512

This data as json

html_url issue_url id node_id user created_at updated_at author_association body reactions performed_via_github_app issue
https://github.com/pydata/xarray/pull/3764#issuecomment-586597512 https://api.github.com/repos/pydata/xarray/issues/3764 586597512 MDEyOklzc3VlQ29tbWVudDU4NjU5NzUxMg== 6628425 2020-02-15T14:51:05Z 2020-02-15T14:51:05Z MEMBER

This is a little bit trickier than I originally anticipated. For indexing with dates very distant from the dates in the index, I'm still running into an issue at this step in pandas.Index._get_nearest_indexer: left_distances = np.abs(self[left_indexer] - target) Consider the following the example: ``` In [1]: import numpy as np; import pandas as pd; import xarray as xr

In [2]: import cftime

In [3]: times = xr.cftime_range('0001', periods=4)

In [4]: da = xr.DataArray(range(4), coords=[('time', times)])

In [5]: da.sel(time=cftime.DatetimeGregorian(2000, 1, 1), method='nearest') ```

In this example self[left_indexer] is CFTimeIndex([0001-01-04 00:00:00], dtype='object', name='time'), while target is Index([2000-01-01 00:00:00], dtype='object'). The distance between these dates is greater than 292 years, so it cannot be represented in a TimedeltaIndex.

One could argue that we could fall back on using a generic Index of dtype object to store the datetime.timedelta object produced in this case: ``` In [6]: left_index = xr.CFTimeIndex([cftime.DatetimeGregorian(1, 1, 4)])

In [7]: target = pd.Index([cftime.DatetimeGregorian(2000, 1, 1)])

In [8]: difference = left_index - target

In [9]: difference Out[9]: Index([-730118 days, 0:00:00], dtype='object') `` A problem occurs though when we try to take the absolute value of this index. Pandas (I think reasonably so) tries to detect the datatype of the result and construct a new index. In doing so it tries to create aTimedeltaIndex, but cannot, because thetimedelta` inside is too large:

Result of np.abs(difference)

``` In [10]: np.abs(difference) --------------------------------------------------------------------------- TypeError Traceback (most recent call last) ~/Software/pandas/pandas/_libs/tslibs/timedeltas.pyx in pandas._libs.tslibs.timedeltas.array_to_timedelta64() TypeError: Expected unicode, got datetime.timedelta During handling of the above exception, another exception occurred: OverflowError Traceback (most recent call last) <ipython-input-17-95776624315c> in <module> ----> 1 np.abs(difference) ~/Software/pandas/pandas/core/indexes/base.py in __array_wrap__(self, result, context) 628 629 attrs = self._get_attributes_dict() --> 630 return Index(result, **attrs) 631 632 @cache_readonly ~/Software/pandas/pandas/core/indexes/base.py in __new__(cls, data, dtype, copy, name, tupleize_cols, **kwargs) 412 413 if dtype is None: --> 414 new_data, new_dtype = _maybe_cast_data_without_dtype(subarr) 415 if new_dtype is not None: 416 return cls( ~/Software/pandas/pandas/core/indexes/base.py in _maybe_cast_data_without_dtype(subarr) 5711 5712 elif inferred.startswith("timedelta"): -> 5713 data = TimedeltaArray._from_sequence(subarr, copy=False) 5714 return data, data.dtype 5715 elif inferred == "period": ~/Software/pandas/pandas/core/arrays/timedeltas.py in _from_sequence(cls, data, dtype, copy, freq, unit) 212 freq, freq_infer = dtl.maybe_infer_freq(freq) 213 --> 214 data, inferred_freq = sequence_to_td64ns(data, copy=copy, unit=unit) 215 freq, freq_infer = dtl.validate_inferred_freq(freq, inferred_freq, freq_infer) 216 ~/Software/pandas/pandas/core/arrays/timedeltas.py in sequence_to_td64ns(data, copy, unit, errors) 938 if is_object_dtype(data.dtype) or is_string_dtype(data.dtype): 939 # no need to make a copy, need to convert if string-dtyped --> 940 data = objects_to_td64ns(data, unit=unit, errors=errors) 941 copy = False 942 ~/Software/pandas/pandas/core/arrays/timedeltas.py in objects_to_td64ns(data, unit, errors) 1047 values = np.array(data, dtype=np.object_, copy=False) 1048 -> 1049 result = array_to_timedelta64(values, unit=unit, errors=errors) 1050 return result.view("timedelta64[ns]") 1051 ~/Software/pandas/pandas/_libs/tslibs/timedeltas.pyx in pandas._libs.tslibs.timedeltas.array_to_timedelta64() ~/Software/pandas/pandas/_libs/tslibs/timedeltas.pyx in pandas._libs.tslibs.timedeltas.convert_to_timedelta64() ~/Software/pandas/pandas/_libs/tslibs/timedeltas.pyx in pandas._libs.tslibs.timedeltas.delta_to_nanoseconds() OverflowError: Python int too large to convert to C long ```

So I'm a little bit back to the drawing board regarding a solution for this. Part of me is tempted to write an overriding version of _get_nearest_indexer for CFTimeIndex that basically restores the old behavior; I'm not sure how dangerous that would be to rely on though.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  563202971
Powered by Datasette · Queries took 4.142ms · About: xarray-datasette