{"database": "github", "table": "issue_comments", "is_view": false, "human_description_en": "where author_association = \"MEMBER\", issue = 563202971 and user = 6628425 sorted by updated_at descending", "rows": [["https://github.com/pydata/xarray/pull/3764#issuecomment-589963580", "https://api.github.com/repos/pydata/xarray/issues/3764", 589963580, "MDEyOklzc3VlQ29tbWVudDU4OTk2MzU4MA==", 6628425, "2020-02-22T14:55:00Z", "2020-02-22T14:55:00Z", "MEMBER", "I added some updates to this PR this morning that in principle would solve the indexing with `method=\"nearest\"` from within xarray.  Unfortunately though, due to the issue I described in https://github.com/pydata/xarray/pull/3764#issuecomment-586597512, I was not able to come up with a solution that did not require overriding the pandas implementation of `_get_nearest_indexer`.  If this seems unacceptable, maybe we can think harder about how we might address this upstream instead (e.g. I think special-casing the change made in https://github.com/pandas-dev/pandas/pull/31511 to just DatetimeIndexes, and preserving the old behavior for everything else, could be sufficient).\r\n\r\nIn addition, for the time being I xfailed `test_indexing_in_series_getitem`, as I think there is agreement that that would be best addressed upstream, https://github.com/pydata/xarray/issues/3751#issuecomment-587572443.\r\n\r\nFinally in the process of doing this, I cleaned up the implementation of `CFTimeIndex.__sub__`, and added some more test coverage; hopefully now it's a little clearer what cases it's supposed to work for and what cases it is not.", "{\"total_count\": 0, \"+1\": 0, \"-1\": 0, \"laugh\": 0, \"hooray\": 0, \"confused\": 0, \"heart\": 0, \"rocket\": 0, \"eyes\": 0}", null, 563202971], ["https://github.com/pydata/xarray/pull/3764#issuecomment-586597512", "https://api.github.com/repos/pydata/xarray/issues/3764", 586597512, "MDEyOklzc3VlQ29tbWVudDU4NjU5NzUxMg==", 6628425, "2020-02-15T14:51:05Z", "2020-02-15T14:51:05Z", "MEMBER", "This is a little bit trickier than I originally anticipated.  For indexing with dates very distant from the dates in the index, I'm still running into an issue at this step in `pandas.Index._get_nearest_indexer`:\r\n```\r\nleft_distances = np.abs(self[left_indexer] - target)\r\n```\r\nConsider the following the example:\r\n```\r\nIn [1]: import numpy as np; import pandas as pd; import xarray as xr\r\n\r\nIn [2]: import cftime\r\n\r\nIn [3]: times = xr.cftime_range('0001', periods=4)\r\n\r\nIn [4]: da = xr.DataArray(range(4), coords=[('time', times)])\r\n\r\nIn [5]: da.sel(time=cftime.DatetimeGregorian(2000, 1, 1), method='nearest')\r\n```\r\n\r\nIn this example `self[left_indexer]` is `CFTimeIndex([0001-01-04 00:00:00], dtype='object', name='time')`, while `target` is `Index([2000-01-01 00:00:00], dtype='object')`.  The distance between these dates is greater than 292 years, so it cannot be represented in a `TimedeltaIndex`.  \r\n\r\nOne could argue that we could fall back on using a generic `Index` of dtype object to store the `datetime.timedelta` object produced in this case:\r\n```\r\nIn [6]: left_index = xr.CFTimeIndex([cftime.DatetimeGregorian(1, 1, 4)])\r\n\r\nIn [7]: target = pd.Index([cftime.DatetimeGregorian(2000, 1, 1)])\r\n\r\nIn [8]: difference = left_index - target\r\n\r\nIn [9]: difference\r\nOut[9]: Index([-730118 days, 0:00:00], dtype='object')\r\n```\r\nA problem occurs though when we try to take the absolute value of this index.  Pandas (I think reasonably so) tries to detect the datatype of the result and construct a new index.  In doing so it tries to create a `TimedeltaIndex`, but cannot, because the `timedelta` inside is too large:\r\n\r\n<details><summary>Result of np.abs(difference)</summary>\r\n<p>\r\n\r\n```\r\nIn [10]: np.abs(difference)\r\n---------------------------------------------------------------------------\r\nTypeError                                 Traceback (most recent call last)\r\n~/Software/pandas/pandas/_libs/tslibs/timedeltas.pyx in pandas._libs.tslibs.timedeltas.array_to_timedelta64()\r\n\r\nTypeError: Expected unicode, got datetime.timedelta\r\n\r\nDuring handling of the above exception, another exception occurred:\r\n\r\nOverflowError                             Traceback (most recent call last)\r\n<ipython-input-17-95776624315c> in <module>\r\n----> 1 np.abs(difference)\r\n\r\n~/Software/pandas/pandas/core/indexes/base.py in __array_wrap__(self, result, context)\r\n    628\r\n    629         attrs = self._get_attributes_dict()\r\n--> 630         return Index(result, **attrs)\r\n    631\r\n    632     @cache_readonly\r\n\r\n~/Software/pandas/pandas/core/indexes/base.py in __new__(cls, data, dtype, copy, name, tupleize_cols, **kwargs)\r\n    412\r\n    413             if dtype is None:\r\n--> 414                 new_data, new_dtype = _maybe_cast_data_without_dtype(subarr)\r\n    415                 if new_dtype is not None:\r\n    416                     return cls(\r\n\r\n~/Software/pandas/pandas/core/indexes/base.py in _maybe_cast_data_without_dtype(subarr)\r\n   5711\r\n   5712         elif inferred.startswith(\"timedelta\"):\r\n-> 5713             data = TimedeltaArray._from_sequence(subarr, copy=False)\r\n   5714             return data, data.dtype\r\n   5715         elif inferred == \"period\":\r\n\r\n~/Software/pandas/pandas/core/arrays/timedeltas.py in _from_sequence(cls, data, dtype, copy, freq, unit)\r\n    212         freq, freq_infer = dtl.maybe_infer_freq(freq)\r\n    213\r\n--> 214         data, inferred_freq = sequence_to_td64ns(data, copy=copy, unit=unit)\r\n    215         freq, freq_infer = dtl.validate_inferred_freq(freq, inferred_freq, freq_infer)\r\n    216\r\n\r\n~/Software/pandas/pandas/core/arrays/timedeltas.py in sequence_to_td64ns(data, copy, unit, errors)\r\n    938     if is_object_dtype(data.dtype) or is_string_dtype(data.dtype):\r\n    939         # no need to make a copy, need to convert if string-dtyped\r\n--> 940         data = objects_to_td64ns(data, unit=unit, errors=errors)\r\n    941         copy = False\r\n    942\r\n\r\n~/Software/pandas/pandas/core/arrays/timedeltas.py in objects_to_td64ns(data, unit, errors)\r\n   1047     values = np.array(data, dtype=np.object_, copy=False)\r\n   1048\r\n-> 1049     result = array_to_timedelta64(values, unit=unit, errors=errors)\r\n   1050     return result.view(\"timedelta64[ns]\")\r\n   1051\r\n\r\n~/Software/pandas/pandas/_libs/tslibs/timedeltas.pyx in pandas._libs.tslibs.timedeltas.array_to_timedelta64()\r\n\r\n~/Software/pandas/pandas/_libs/tslibs/timedeltas.pyx in pandas._libs.tslibs.timedeltas.convert_to_timedelta64()\r\n\r\n~/Software/pandas/pandas/_libs/tslibs/timedeltas.pyx in pandas._libs.tslibs.timedeltas.delta_to_nanoseconds()\r\n\r\nOverflowError: Python int too large to convert to C long\r\n```\r\n\r\n</p>\r\n</details>\r\n\r\n\r\nSo I'm a little bit back to the drawing board regarding a solution for this.  Part of me is tempted to write an overriding version of `_get_nearest_indexer` for `CFTimeIndex` that basically restores the old behavior; I'm not sure how dangerous that would be to rely on though.", "{\"total_count\": 0, \"+1\": 0, \"-1\": 0, \"laugh\": 0, \"hooray\": 0, \"confused\": 0, \"heart\": 0, \"rocket\": 0, \"eyes\": 0}", null, 563202971]], "truncated": false, "filtered_table_rows_count": 2, "expanded_columns": [], "expandable_columns": [[{"column": "issue", "other_table": "issues", "other_column": "id"}, "title"], [{"column": "user", "other_table": "users", "other_column": "id"}, "login"]], "columns": ["html_url", "issue_url", "id", "node_id", "user", "created_at", "updated_at", "author_association", "body", "reactions", "performed_via_github_app", "issue"], "primary_keys": ["id"], "units": {}, "query": {"sql": "select html_url, issue_url, id, node_id, user, created_at, updated_at, author_association, body, reactions, performed_via_github_app, issue from issue_comments where \"author_association\" = :p0 and \"issue\" = :p1 and \"user\" = :p2 order by updated_at desc limit 101", "params": {"p0": "MEMBER", "p1": "563202971", "p2": "6628425"}}, "facet_results": {"author_association": {"name": "author_association", "type": "column", "hideable": false, "toggle_url": "/github/issue_comments.json?author_association=MEMBER&issue=563202971&user=6628425", "results": [{"value": "MEMBER", "label": "MEMBER", "count": 2, "toggle_url": "http://xarray-datasette.fly.dev/github/issue_comments.json?issue=563202971&user=6628425", "selected": true}], "truncated": false}, "user": {"name": "user", "type": "column", "hideable": false, "toggle_url": "/github/issue_comments.json?author_association=MEMBER&issue=563202971&user=6628425", "results": [{"value": 6628425, "label": "spencerkclark", "count": 2, "toggle_url": "http://xarray-datasette.fly.dev/github/issue_comments.json?author_association=MEMBER&issue=563202971", "selected": true}], "truncated": false}, "issue": {"name": "issue", "type": "column", "hideable": false, "toggle_url": "/github/issue_comments.json?author_association=MEMBER&issue=563202971&user=6628425", "results": [{"value": 563202971, "label": "Fix CFTimeIndex-related errors stemming from updates in pandas", "count": 2, "toggle_url": "http://xarray-datasette.fly.dev/github/issue_comments.json?author_association=MEMBER&user=6628425", "selected": true}], "truncated": false}}, "suggested_facets": [{"name": "created_at", "type": "date", "toggle_url": "http://xarray-datasette.fly.dev/github/issue_comments.json?author_association=MEMBER&issue=563202971&user=6628425&_facet_date=created_at"}, {"name": "updated_at", "type": "date", "toggle_url": "http://xarray-datasette.fly.dev/github/issue_comments.json?author_association=MEMBER&issue=563202971&user=6628425&_facet_date=updated_at"}], "next": null, "next_url": null, "private": false, "allow_execute_sql": true, "query_ms": 2638.64786492195}