home / github

Menu
  • GraphQL API
  • Search all tables

issue_comments

Table actions
  • GraphQL API for issue_comments

2 rows where author_association = "MEMBER" and issue = 523037716 sorted by updated_at descending

✎ View and edit SQL

This data as json, CSV (advanced)

Suggested facets: created_at (date), updated_at (date)

user 2

  • spencerkclark 1
  • mathause 1

issue 1

  • subtracting CFTimeIndex can cause pd.TimedeltaIndex to overflow · 2 ✖

author_association 1

  • MEMBER · 2 ✖
id html_url issue_url node_id user created_at updated_at ▲ author_association body reactions performed_via_github_app issue
554648042 https://github.com/pydata/xarray/issues/3535#issuecomment-554648042 https://api.github.com/repos/pydata/xarray/issues/3535 MDEyOklzc3VlQ29tbWVudDU1NDY0ODA0Mg== spencerkclark 6628425 2019-11-16T15:39:23Z 2019-11-16T15:39:23Z MEMBER

Thanks for raising this issue @mathause. In hindsight this does not surprise me. Pandas's strict use of nanosecond-resolution datetimes and timedeltas was part of the motivation for the CFTimeIndex. While convenient, because it allows us to re-use code already written in pandas, holding the result of the difference between two CFTimeIndexes in a TimedeltaIndex clearly prevents us from taking the difference between distant dates.

Perhaps a more robust (yet more complex) solution for https://github.com/pydata/xarray/issues/2484 would be to write a version of a TimedeltaIndex that does not internally cast the timedeltas to type np.timedelta64[ns], and rather leaves them as datetime.timedelta objects, which are the actual result of subtracting two sequences of cftime.datetime objects.

Regarding the combine_by_coords issue, though, there might be an easier fix. Is there a reason that first_items is an Index of length-one Indexes? It's not clear to me why that needs to be the case.

https://github.com/pydata/xarray/blob/56c16e4bf45a3771fd9acba76d802c0199c14519/xarray/core/combine.py#L91

It appears if we just select the first value of each index (i.e. a cftime.datetime object in this example), e.g.

python first_items = pd.Index([index[0] for index in indexes]) pandas's rank method works properly and combine_by_coords produces the correct result:

```

xr.combine_by_coords([d1, d2, d3]).time <xarray.DataArray 'time' (time: 3)> array([cftime.DatetimeGregorian(4500, 12, 31, 0, 0, 0, 0, 4, 365), cftime.DatetimeGregorian(4600, 12, 31, 0, 0, 0, 0, 2, 365), cftime.DatetimeGregorian(5100, 12, 31, 0, 0, 0, 0, 0, 365)], dtype=object) Coordinates: * time (time) object 4500-12-31 00:00:00 ... 5100-12-31 00:00:00 ```

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  subtracting CFTimeIndex can cause pd.TimedeltaIndex to overflow 523037716
554317768 https://github.com/pydata/xarray/issues/3535#issuecomment-554317768 https://api.github.com/repos/pydata/xarray/issues/3535 MDEyOklzc3VlQ29tbWVudDU1NDMxNzc2OA== mathause 10194086 2019-11-15T11:05:44Z 2019-11-15T11:08:29Z MEMBER

This happens in xr.combinde_by_coords. Note that the OverflowError is "ignored in: pandas._libs.algos.are_diff'". Soxr.combinde_by_coords` can return a wrong dataset (although this does not happen silently):

``` python

import xarray as xr i1 = xr.cftime_range("4500-12-31", periods=1) i2 = xr.cftime_range("4600-12-31", periods=1) i3 = xr.cftime_range("5100-12-31", periods=1)

d1 = xr.DataArray([0], dims=("time", ), coords={"time": ("time", i1)}).to_dataset(name="a") d2 = xr.DataArray([1], dims=("time", ), coords={"time": ("time", i2)}).to_dataset(name="a") d3 = xr.DataArray([2], dims=("time", ), coords={"time": ("time", i3)}).to_dataset(name="a")

xr.combine_by_coords([d1, d2, d3]).time ```

returns:

python <xarray.DataArray 'time' (time: 2)> array([cftime.DatetimeGregorian(4500-12-31 00:00:00), cftime.DatetimeGregorian(5100-12-31 00:00:00)], dtype=object) Coordinates: * time (time) object 4500-12-31 00:00:00 5100-12-31 00:00:00 note how d2 is missing.


Within xr.combine_by_coords the error happens here:

https://github.com/pydata/xarray/blob/7b4a286f59bc7d60d4e4d03be65562ff63f9b111/xarray/core/combine.py#L98

``` python import pandas as pd

indexes = [i1, i2, i3]

the code from _infer_concat_order_from_coords

first_items = pd.Index([index.take([0]) for index in indexes])

series = first_items.to_series() rank = series.rank(method="dense", ascending=ascending) order = rank.astype(int).values - 1

order

array([0, 1, 1]) ```

This causes the second item to be dropped.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  subtracting CFTimeIndex can cause pd.TimedeltaIndex to overflow 523037716

Advanced export

JSON shape: default, array, newline-delimited, object

CSV options:

CREATE TABLE [issue_comments] (
   [html_url] TEXT,
   [issue_url] TEXT,
   [id] INTEGER PRIMARY KEY,
   [node_id] TEXT,
   [user] INTEGER REFERENCES [users]([id]),
   [created_at] TEXT,
   [updated_at] TEXT,
   [author_association] TEXT,
   [body] TEXT,
   [reactions] TEXT,
   [performed_via_github_app] TEXT,
   [issue] INTEGER REFERENCES [issues]([id])
);
CREATE INDEX [idx_issue_comments_issue]
    ON [issue_comments] ([issue]);
CREATE INDEX [idx_issue_comments_user]
    ON [issue_comments] ([user]);
Powered by Datasette · Queries took 16.884ms · About: xarray-datasette