home / github

Menu
  • GraphQL API
  • Search all tables

issue_comments

Table actions
  • GraphQL API for issue_comments

10 rows where author_association = "MEMBER", issue = 187591179 and user = 1217238 sorted by updated_at descending

✎ View and edit SQL

This data as json, CSV (advanced)

Suggested facets: reactions, created_at (date), updated_at (date)

user 1

  • shoyer · 10 ✖

issue 1

  • Towards a (temporary?) workaround for datetime issues at the xarray-level · 10 ✖

author_association 1

  • MEMBER · 10 ✖
id html_url issue_url node_id user created_at updated_at ▲ author_association body reactions performed_via_github_app issue
276719845 https://github.com/pydata/xarray/issues/1084#issuecomment-276719845 https://api.github.com/repos/pydata/xarray/issues/1084 MDEyOklzc3VlQ29tbWVudDI3NjcxOTg0NQ== shoyer 1217238 2017-02-01T17:17:46Z 2017-02-01T17:19:48Z MEMBER

@spencerkclark I still think subclassing pandas.Index is the best path forward.

Subclassing DatetimeIndex is a non-starter because it presumes the use of datetime64 internally.

Subclassing PeriodIndex is also potentially viable, but would require hooking pretty deeply into pandas's internal machinery with an API that isn't really designed to be extended externally. For example, frequency conversion is done in C. From a data model perspective, it might work to use custom frequencies for different calendars, but it's not the cleanest abstraction -- really the frequency and the calendar are two separate notions. From a practical perspective, it's also less promising because it would require writing much of the logic from netcdftime to reimplement arithmetic and so on. And there would still be issues with datetime strings and converting back and forth to datetimes.

The downside of using pandas.Index is that resampling won't work out of the box. But we can work around that in xarray, and potentially even in pandas if we add an a simple dispatching mechanism into pandas.TimeGrouper.

{
    "total_count": 1,
    "+1": 1,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Towards a (temporary?) workaround for datetime issues at the xarray-level 187591179
275963433 https://github.com/pydata/xarray/issues/1084#issuecomment-275963433 https://api.github.com/repos/pydata/xarray/issues/1084 MDEyOklzc3VlQ29tbWVudDI3NTk2MzQzMw== shoyer 1217238 2017-01-30T01:31:30Z 2017-01-30T01:31:30Z MEMBER

@spencerkclark I'm looking forward to the PR!

I understand that pandas wants a get_value method, but I would make this as minimal as possible, e.g., just: def get_value(self, series, key): return series.iloc[self.get_loc(key)] see here for discussion: https://github.com/pandas-dev/pandas/pull/8707/files#diff-3097b8b1cd77ca106d9f4237db9a516aR386

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Towards a (temporary?) workaround for datetime issues at the xarray-level 187591179
274379833 https://github.com/pydata/xarray/issues/1084#issuecomment-274379833 https://api.github.com/repos/pydata/xarray/issues/1084 MDEyOklzc3VlQ29tbWVudDI3NDM3OTgzMw== shoyer 1217238 2017-01-23T01:55:26Z 2017-01-23T01:55:26Z MEMBER

Would it be safer to name the implementation of _maybe_cast_slice_bound in NetCDFTimeIndex something slightly different and override the public get_slice_bound method (the only place _maybe_cast_slice_bound gets called in Index) of Index, instead? Or is that not worth the trouble?

If you want to take this approach, name it something different and then use to parse inputs to get_slice_bound before calling the super class method. It would work the same, just with a slightly worse error message. But I'm not sure it's worth the trouble.

One advantage of having our parser is that it would be pretty easy to extend as necessary, e.g., to handle negative years as specified by Wikipedia:

To represent years before 0000 or after 9999, the standard also permits the expansion of the year representation but only by prior agreement between the sender and the receiver.[11] An expanded year representation [±YYYYY] must have an agreed-upon number of extra year digits beyond the four-digit minimum, and it must be prefixed with a + or − sign[12] instead of the more common AD/BC (or BCE/CE) notation; by convention 1 BC is labelled +0000, 2 BC is labeled −0001, and so on.[13]

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Towards a (temporary?) workaround for datetime issues at the xarray-level 187591179
274374525 https://github.com/pydata/xarray/issues/1084#issuecomment-274374525 https://api.github.com/repos/pydata/xarray/issues/1084 MDEyOklzc3VlQ29tbWVudDI3NDM3NDUyNQ== shoyer 1217238 2017-01-23T00:51:39Z 2017-01-23T00:51:39Z MEMBER

As for the implementation in your gist, it looks pretty solid to me. One possible concern is that we are overwriting _maybe_cast_slice_bound, which is a private method, and it's not entirely clear what parts of pandas Index API (if any) are designed for subclassing. But this is certainly unlikely to break anytime soon (prior to pandas 2.0, which we will see coming a long ways off).

It might also be worth testing this index in a pandas DataFrame/Series. I don't expect things to work 100% but it might be enough to be useful.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Towards a (temporary?) workaround for datetime issues at the xarray-level 187591179
274372547 https://github.com/pydata/xarray/issues/1084#issuecomment-274372547 https://api.github.com/repos/pydata/xarray/issues/1084 MDEyOklzc3VlQ29tbWVudDI3NDM3MjU0Nw== shoyer 1217238 2017-01-23T00:21:49Z 2017-01-23T00:22:44Z MEMBER

I'm not particularly tied to one date parsing approach over another (here I was just mimicking pandas). In an ideal world, what would be your preference?

Honestly, it's probably more reliable to just parse ISO-8601 ourselves, which is intentionally pretty simple. That will give us complete control.

Here's something simple I wrote to parse ISO-8601 using a regex: ```python import re

def named(name, pattern): return '(?P<' + name + '>' + pattern + ')'

def optional(x): return '(?:' + x + ')?'

def trailing_optional(xs): if not xs: return '' return xs[0] + optional(trailing_optional(xs[1:]))

def build_pattern(date_sep='-', datetime_sep='T', time_sep='\:'): pieces = [(None, 'year', '\d{4}'), (date_sep, 'month', '\d{2}'), (date_sep, 'day', '\d{2}'), (datetime_sep, 'hour', '\d{2}'), (time_sep, 'minute', '\d{2}'), (time_sep, 'second', '\d{2}' + optional('.\d+'))] pattern_list = [] for sep, name, sub_pattern in pieces: pattern_list.append((sep if sep else '') + named(name, sub_pattern)) # TODO: allow timezone offsets? return '^' + trailing_optional(pattern_list) + '$'

def parse_iso8601(datetime_string): basic_pattern = build_pattern(date_sep='', time_sep='') extended_pattern = build_pattern() patterns = [basic_pattern, extended_pattern] for pattern in patterns: match = re.match(pattern, datetime_string) if match: return match.groupdict() raise ValueError('no ISO-8601 match for string: %s' % datetime_string)

test cases

print parse_iso8601('1999') print parse_iso8601('19990101') print parse_iso8601('1999-01-01') print parse_iso8601('1999-01-01T12') print parse_iso8601('1999-01-01T12:34') print parse_iso8601('1999-01-01T12:34:56.78') print parse_iso8601('19990101T123456.78') {'minute': None, 'year': '1999', 'second': None, 'month': None, 'hour': None, 'day': None} {'minute': None, 'year': '1999', 'second': None, 'month': '01', 'hour': None, 'day': '01'} {'minute': None, 'year': '1999', 'second': None, 'month': '01', 'hour': None, 'day': '01'} {'minute': None, 'year': '1999', 'second': None, 'month': '01', 'hour': '12', 'day': '01'} {'minute': '34', 'year': '1999', 'second': None, 'month': '01', 'hour': '12', 'day': '01'} {'minute': '34', 'year': '1999', 'second': '56.78', 'month': '01', 'hour': '12', 'day': '01'} {'minute': '34', 'year': '1999', 'second': '56.78', 'month': '01', 'hour': '12', 'day': '01'} ```

{
    "total_count": 1,
    "+1": 1,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Towards a (temporary?) workaround for datetime issues at the xarray-level 187591179
274364774 https://github.com/pydata/xarray/issues/1084#issuecomment-274364774 https://api.github.com/repos/pydata/xarray/issues/1084 MDEyOklzc3VlQ29tbWVudDI3NDM2NDc3NA== shoyer 1217238 2017-01-22T22:22:55Z 2017-01-22T22:22:55Z MEMBER

@spencerkclark It looks like dateutil's _parse method has dayfirst and yearfirst parameters to disambiguate such date. For netcdftime, I think we always want dayfirst=False, yearfirst=True (basically, use YYYY-MM-DD like ISO 8601).

I'm not super happy with using the private _parse method but I given that pandas uses it I suppose this is pretty safe. In the worse case scenario, it would be pretty easy to fork this logic from dateutil or a library with ISO-8601 parsing (e.g., Arrow).

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Towards a (temporary?) workaround for datetime issues at the xarray-level 187591179
269878256 https://github.com/pydata/xarray/issues/1084#issuecomment-269878256 https://api.github.com/repos/pydata/xarray/issues/1084 MDEyOklzc3VlQ29tbWVudDI2OTg3ODI1Ng== shoyer 1217238 2016-12-31T19:09:19Z 2017-01-01T00:18:20Z MEMBER

@spencerkclark

partial datetime string indexing would be a must

The good news here is that almost all this logic lives in pandas's Python code and should be straightforward to duplicate. I can point you to the relevant locations if you're having trouble figuring this out.

An additional requirement would be serialization to netCDF files.

It should be quite straightforward to integrate this into xarray's existing serialization logic.

From a mechanics perspective, do we want this sort of index to live totally inside xarray or should it be in a separate package?

We could go either way here, but for now I think it makes sense to keep this in xarray proper.

Here's why:

  • We already have a bunch of CF-conventions specific logic related to serialization in xarray. Logically it makes sense to keep this together.
  • This will be a strict improvement over xarray's current handling of netcdf time objects.
  • I suspect most users who are interested combining netCDF times and pandas (e.g., if they want to a NetCDFTimeIndex in a DataFrame) are already using xarray, or at least willing to install it.
{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Towards a (temporary?) workaround for datetime issues at the xarray-level 187591179
265113215 https://github.com/pydata/xarray/issues/1084#issuecomment-265113215 https://api.github.com/repos/pydata/xarray/issues/1084 MDEyOklzc3VlQ29tbWVudDI2NTExMzIxNQ== shoyer 1217238 2016-12-06T10:20:58Z 2016-12-06T10:20:58Z MEMBER

@spencerahill Those are absolutely not essential -- I think your basic example is probably already enough to be useful.

Perhaps the most complete list of time series functionality in xarray can be found on this doc page: http://xarray.pydata.org/en/stable/time-series.html

{
    "total_count": 1,
    "+1": 1,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Towards a (temporary?) workaround for datetime issues at the xarray-level 187591179
264884735 https://github.com/pydata/xarray/issues/1084#issuecomment-264884735 https://api.github.com/repos/pydata/xarray/issues/1084 MDEyOklzc3VlQ29tbWVudDI2NDg4NDczNQ== shoyer 1217238 2016-12-05T15:32:30Z 2016-12-05T15:32:30Z MEMBER

@spencerkclark This looks pretty sane to me, though of course it's still missing a few nice things you can do with datetime64 (e.g., reindex and partial datetime string selection).

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Towards a (temporary?) workaround for datetime issues at the xarray-level 187591179
258713425 https://github.com/pydata/xarray/issues/1084#issuecomment-258713425 https://api.github.com/repos/pydata/xarray/issues/1084 MDEyOklzc3VlQ29tbWVudDI1ODcxMzQyNQ== shoyer 1217238 2016-11-06T21:48:30Z 2016-11-06T21:48:30Z MEMBER

To be clear, the subclassing question is whether we should subclass Index or not. This would entail overriding a lot of methods, and would probably break in the future (with pandas 2.0). I definitely do not recommend subclassing PeriodIndex -- that could be very fragile.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Towards a (temporary?) workaround for datetime issues at the xarray-level 187591179

Advanced export

JSON shape: default, array, newline-delimited, object

CSV options:

CREATE TABLE [issue_comments] (
   [html_url] TEXT,
   [issue_url] TEXT,
   [id] INTEGER PRIMARY KEY,
   [node_id] TEXT,
   [user] INTEGER REFERENCES [users]([id]),
   [created_at] TEXT,
   [updated_at] TEXT,
   [author_association] TEXT,
   [body] TEXT,
   [reactions] TEXT,
   [performed_via_github_app] TEXT,
   [issue] INTEGER REFERENCES [issues]([id])
);
CREATE INDEX [idx_issue_comments_issue]
    ON [issue_comments] ([issue]);
CREATE INDEX [idx_issue_comments_user]
    ON [issue_comments] ([user]);
Powered by Datasette · Queries took 175.507ms · About: xarray-datasette