html_url,issue_url,id,node_id,user,created_at,updated_at,author_association,body,reactions,performed_via_github_app,issue https://github.com/pydata/xarray/issues/1084#issuecomment-276719845,https://api.github.com/repos/pydata/xarray/issues/1084,276719845,MDEyOklzc3VlQ29tbWVudDI3NjcxOTg0NQ==,1217238,2017-02-01T17:17:46Z,2017-02-01T17:19:48Z,MEMBER,"@spencerkclark I still think subclassing pandas.Index is the best path forward. Subclassing DatetimeIndex is a non-starter because it presumes the use of datetime64 internally. Subclassing PeriodIndex is also potentially viable, but would require hooking pretty deeply into pandas's internal machinery with an API that isn't really designed to be extended externally. For example, frequency conversion [is done in C](https://github.com/pandas-dev/pandas/blob/master/pandas/src/period_helper.c). From a data model perspective, it might work to use custom frequencies for different calendars, but it's not the cleanest abstraction -- really the frequency and the calendar are two separate notions. From a practical perspective, it's also less promising because it would require writing much of the logic from netcdftime to reimplement arithmetic and so on. And there would still be issues with datetime strings and converting back and forth to datetimes. The downside of using pandas.Index is that resampling won't work out of the box. But we can work around that in xarray, and potentially even in pandas if we add an a simple dispatching mechanism into `pandas.TimeGrouper`.","{""total_count"": 1, ""+1"": 1, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,187591179 https://github.com/pydata/xarray/issues/1084#issuecomment-275963433,https://api.github.com/repos/pydata/xarray/issues/1084,275963433,MDEyOklzc3VlQ29tbWVudDI3NTk2MzQzMw==,1217238,2017-01-30T01:31:30Z,2017-01-30T01:31:30Z,MEMBER,"@spencerkclark I'm looking forward to the PR! I understand that pandas wants a `get_value` method, but I would make this as minimal as possible, e.g., just: ``` def get_value(self, series, key): return series.iloc[self.get_loc(key)] ``` see here for discussion: https://github.com/pandas-dev/pandas/pull/8707/files#diff-3097b8b1cd77ca106d9f4237db9a516aR386","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,187591179 https://github.com/pydata/xarray/issues/1084#issuecomment-274379833,https://api.github.com/repos/pydata/xarray/issues/1084,274379833,MDEyOklzc3VlQ29tbWVudDI3NDM3OTgzMw==,1217238,2017-01-23T01:55:26Z,2017-01-23T01:55:26Z,MEMBER,"> Would it be safer to name the implementation of _maybe_cast_slice_bound in NetCDFTimeIndex something slightly different and override the public get_slice_bound method (the only place _maybe_cast_slice_bound gets called in Index) of Index, instead? Or is that not worth the trouble? If you want to take this approach, name it something different and then use to parse inputs to `get_slice_bound` before calling the super class method. It would work the same, just with a slightly worse error message. But I'm not sure it's worth the trouble. One advantage of having our parser is that it would be pretty easy to extend as necessary, e.g., to handle negative years as specified by Wikipedia: > To represent years before 0000 or after 9999, the standard also permits the expansion of the year representation but only by prior agreement between the sender and the receiver.[11] An expanded year representation [±YYYYY] must have an agreed-upon number of extra year digits beyond the four-digit minimum, and it must be prefixed with a + or − sign[12] instead of the more common AD/BC (or BCE/CE) notation; by convention 1 BC is labelled +0000, 2 BC is labeled −0001, and so on.[13]","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,187591179 https://github.com/pydata/xarray/issues/1084#issuecomment-274374525,https://api.github.com/repos/pydata/xarray/issues/1084,274374525,MDEyOklzc3VlQ29tbWVudDI3NDM3NDUyNQ==,1217238,2017-01-23T00:51:39Z,2017-01-23T00:51:39Z,MEMBER,"As for the implementation in your gist, it looks pretty solid to me. One possible concern is that we are overwriting `_maybe_cast_slice_bound`, which is a private method, and it's not entirely clear what parts of pandas Index API (if any) are designed for subclassing. But this is certainly unlikely to break anytime soon (prior to pandas 2.0, which we will see coming a long ways off). It might also be worth testing this index in a pandas DataFrame/Series. I don't expect things to work 100% but it might be enough to be useful.","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,187591179 https://github.com/pydata/xarray/issues/1084#issuecomment-274372547,https://api.github.com/repos/pydata/xarray/issues/1084,274372547,MDEyOklzc3VlQ29tbWVudDI3NDM3MjU0Nw==,1217238,2017-01-23T00:21:49Z,2017-01-23T00:22:44Z,MEMBER,"> I'm not particularly tied to one date parsing approach over another (here I was just mimicking pandas). In an ideal world, what would be your preference? Honestly, it's probably more reliable to just parse ISO-8601 ourselves, which is intentionally [pretty simple](https://en.wikipedia.org/wiki/ISO_8601#General_principles). That will give us complete control. Here's something simple I wrote to parse ISO-8601 using a regex: ```python import re def named(name, pattern): return '(?P<' + name + '>' + pattern + ')' def optional(x): return '(?:' + x + ')?' def trailing_optional(xs): if not xs: return '' return xs[0] + optional(trailing_optional(xs[1:])) def build_pattern(date_sep='\-', datetime_sep='T', time_sep='\:'): pieces = [(None, 'year', '\d{4}'), (date_sep, 'month', '\d{2}'), (date_sep, 'day', '\d{2}'), (datetime_sep, 'hour', '\d{2}'), (time_sep, 'minute', '\d{2}'), (time_sep, 'second', '\d{2}' + optional('\.\d+'))] pattern_list = [] for sep, name, sub_pattern in pieces: pattern_list.append((sep if sep else '') + named(name, sub_pattern)) # TODO: allow timezone offsets? return '^' + trailing_optional(pattern_list) + '$' def parse_iso8601(datetime_string): basic_pattern = build_pattern(date_sep='', time_sep='') extended_pattern = build_pattern() patterns = [basic_pattern, extended_pattern] for pattern in patterns: match = re.match(pattern, datetime_string) if match: return match.groupdict() raise ValueError('no ISO-8601 match for string: %s' % datetime_string) # test cases print parse_iso8601('1999') print parse_iso8601('19990101') print parse_iso8601('1999-01-01') print parse_iso8601('1999-01-01T12') print parse_iso8601('1999-01-01T12:34') print parse_iso8601('1999-01-01T12:34:56.78') print parse_iso8601('19990101T123456.78') ``` ``` {'minute': None, 'year': '1999', 'second': None, 'month': None, 'hour': None, 'day': None} {'minute': None, 'year': '1999', 'second': None, 'month': '01', 'hour': None, 'day': '01'} {'minute': None, 'year': '1999', 'second': None, 'month': '01', 'hour': None, 'day': '01'} {'minute': None, 'year': '1999', 'second': None, 'month': '01', 'hour': '12', 'day': '01'} {'minute': '34', 'year': '1999', 'second': None, 'month': '01', 'hour': '12', 'day': '01'} {'minute': '34', 'year': '1999', 'second': '56.78', 'month': '01', 'hour': '12', 'day': '01'} {'minute': '34', 'year': '1999', 'second': '56.78', 'month': '01', 'hour': '12', 'day': '01'} ```","{""total_count"": 1, ""+1"": 1, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,187591179 https://github.com/pydata/xarray/issues/1084#issuecomment-274364774,https://api.github.com/repos/pydata/xarray/issues/1084,274364774,MDEyOklzc3VlQ29tbWVudDI3NDM2NDc3NA==,1217238,2017-01-22T22:22:55Z,2017-01-22T22:22:55Z,MEMBER,"@spencerkclark It looks like `dateutil`'s `_parse` method has `dayfirst` and `yearfirst` parameters to disambiguate such date. For `netcdftime`, I think we always want `dayfirst=False, yearfirst=True` (basically, use `YYYY-MM-DD` like ISO 8601). I'm not super happy with using the private `_parse` method but I given that pandas uses it I suppose this is pretty safe. In the worse case scenario, it would be pretty easy to fork this logic from dateutil or a library with ISO-8601 parsing (e.g., [Arrow](https://github.com/crsmithdev/arrow/blob/master/arrow/parser.py)).","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,187591179 https://github.com/pydata/xarray/issues/1084#issuecomment-269878256,https://api.github.com/repos/pydata/xarray/issues/1084,269878256,MDEyOklzc3VlQ29tbWVudDI2OTg3ODI1Ng==,1217238,2016-12-31T19:09:19Z,2017-01-01T00:18:20Z,MEMBER,"@spencerkclark > partial datetime string indexing would be a must The good news here is that almost all this logic lives in pandas's Python code and should be straightforward to duplicate. I can point you to the relevant locations if you're having trouble figuring this out. > An additional requirement would be serialization to netCDF files. It should be quite straightforward to integrate this into xarray's existing serialization logic. > From a mechanics perspective, do we want this sort of index to live totally inside xarray or should it be in a separate package? We could go either way here, but for now I think it makes sense to keep this in xarray proper. Here's why: - We already have a bunch of CF-conventions specific logic related to serialization in xarray. Logically it makes sense to keep this together. - This will be a strict improvement over xarray's current handling of netcdf time objects. - I suspect most users who are interested combining netCDF times and pandas (e.g., if they want to a `NetCDFTimeIndex` in a DataFrame) are already using xarray, or at least willing to install it.","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,187591179 https://github.com/pydata/xarray/issues/1084#issuecomment-265113215,https://api.github.com/repos/pydata/xarray/issues/1084,265113215,MDEyOklzc3VlQ29tbWVudDI2NTExMzIxNQ==,1217238,2016-12-06T10:20:58Z,2016-12-06T10:20:58Z,MEMBER,"@spencerahill Those are absolutely not essential -- I think your basic example is probably already enough to be useful. Perhaps the most complete list of time series functionality in xarray can be found on this doc page: http://xarray.pydata.org/en/stable/time-series.html","{""total_count"": 1, ""+1"": 1, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,187591179 https://github.com/pydata/xarray/issues/1084#issuecomment-264884735,https://api.github.com/repos/pydata/xarray/issues/1084,264884735,MDEyOklzc3VlQ29tbWVudDI2NDg4NDczNQ==,1217238,2016-12-05T15:32:30Z,2016-12-05T15:32:30Z,MEMBER,"@spencerkclark This looks pretty sane to me, though of course it's still missing a few nice things you can do with `datetime64` (e.g., `reindex` and partial datetime string selection).","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,187591179 https://github.com/pydata/xarray/issues/1084#issuecomment-258713425,https://api.github.com/repos/pydata/xarray/issues/1084,258713425,MDEyOklzc3VlQ29tbWVudDI1ODcxMzQyNQ==,1217238,2016-11-06T21:48:30Z,2016-11-06T21:48:30Z,MEMBER,"To be clear, the subclassing question is whether we should subclass Index or not. This would entail overriding a lot of methods, and would probably break in the future (with pandas 2.0). I definitely do not recommend subclassing PeriodIndex -- that could be very fragile. ","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,187591179