home / github

Menu
  • GraphQL API
  • Search all tables

issue_comments

Table actions
  • GraphQL API for issue_comments

25 rows where author_association = "MEMBER" and issue = 187591179 sorted by updated_at descending

✎ View and edit SQL

This data as json, CSV (advanced)

Suggested facets: reactions, created_at (date), updated_at (date)

user 5

  • shoyer 10
  • spencerkclark 9
  • jreback 3
  • jhamman 2
  • max-sixty 1

issue 1

  • Towards a (temporary?) workaround for datetime issues at the xarray-level · 25 ✖

author_association 1

  • MEMBER · 25 ✖
id html_url issue_url node_id user created_at updated_at ▲ author_association body reactions performed_via_github_app issue
276719845 https://github.com/pydata/xarray/issues/1084#issuecomment-276719845 https://api.github.com/repos/pydata/xarray/issues/1084 MDEyOklzc3VlQ29tbWVudDI3NjcxOTg0NQ== shoyer 1217238 2017-02-01T17:17:46Z 2017-02-01T17:19:48Z MEMBER

@spencerkclark I still think subclassing pandas.Index is the best path forward.

Subclassing DatetimeIndex is a non-starter because it presumes the use of datetime64 internally.

Subclassing PeriodIndex is also potentially viable, but would require hooking pretty deeply into pandas's internal machinery with an API that isn't really designed to be extended externally. For example, frequency conversion is done in C. From a data model perspective, it might work to use custom frequencies for different calendars, but it's not the cleanest abstraction -- really the frequency and the calendar are two separate notions. From a practical perspective, it's also less promising because it would require writing much of the logic from netcdftime to reimplement arithmetic and so on. And there would still be issues with datetime strings and converting back and forth to datetimes.

The downside of using pandas.Index is that resampling won't work out of the box. But we can work around that in xarray, and potentially even in pandas if we add an a simple dispatching mechanism into pandas.TimeGrouper.

{
    "total_count": 1,
    "+1": 1,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Towards a (temporary?) workaround for datetime issues at the xarray-level 187591179
276719997 https://github.com/pydata/xarray/issues/1084#issuecomment-276719997 https://api.github.com/repos/pydata/xarray/issues/1084 MDEyOklzc3VlQ29tbWVudDI3NjcxOTk5Nw== jreback 953992 2017-02-01T17:18:17Z 2017-02-01T17:18:17Z MEMBER

@spencerahill as I said above, you should not need to subclass at all, just define a new frequency,

maybe something like Month30 or somesuch, which then will slot right into PeriodIndex

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Towards a (temporary?) workaround for datetime issues at the xarray-level 187591179
276695798 https://github.com/pydata/xarray/issues/1084#issuecomment-276695798 https://api.github.com/repos/pydata/xarray/issues/1084 MDEyOklzc3VlQ29tbWVudDI3NjY5NTc5OA== spencerkclark 6628425 2017-02-01T15:57:29Z 2017-02-01T15:57:29Z MEMBER

@shoyer @jhamman I will defer to both of you on this issue. In light of recent discussion, what do you think is the preferred approach? It seems like three avenues have been discussed (here and in pandas-dev/pandas#15258):

  1. Subclass pandas.Index (as I've started on above)
  2. Subclass pandas.DatetimeIndex
  3. Create custom calendars for use in PeriodIndex (do we care about the semantic difference between "Periods" and "Datetimes"?)

It's not immediately obvious to me that avenues 2 and 3 would be clean and straightforward, but if there is a way to easily adapt them for custom calendars — even unusual ones like "all-leap" (29-day Februaries every year) and "360-day" (30-day Februaries every year) calendars, which cannot be fully represented by ordinary datetime.datetime objects — I'd certainly be open to looking into it more (surely the less code we would need to add / test the better).

Are we back to the drawing board, or should we continue along the path of avenue 1?

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Towards a (temporary?) workaround for datetime issues at the xarray-level 187591179
276168323 https://github.com/pydata/xarray/issues/1084#issuecomment-276168323 https://api.github.com/repos/pydata/xarray/issues/1084 MDEyOklzc3VlQ29tbWVudDI3NjE2ODMyMw== jreback 953992 2017-01-30T19:43:09Z 2017-01-30T19:43:09Z MEMBER

@jhamman you just need a different frequency, in fact this one is pretty close: https://github.com/pandas-dev/pandas/blob/master/pandas/tseries/offsets.py#L2257

just a matter of defining a fixed-day month frequency (numpy has this by default anyhow). PeriodIndex would then happily take this.

{
    "total_count": 1,
    "+1": 1,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Towards a (temporary?) workaround for datetime issues at the xarray-level 187591179
276166062 https://github.com/pydata/xarray/issues/1084#issuecomment-276166062 https://api.github.com/repos/pydata/xarray/issues/1084 MDEyOklzc3VlQ29tbWVudDI3NjE2NjA2Mg== max-sixty 5635139 2017-01-30T19:35:05Z 2017-01-30T19:35:05Z MEMBER

Is there a way to add a new calendar to PeriodIndex without rebuilding all that infrastructure?

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Towards a (temporary?) workaround for datetime issues at the xarray-level 187591179
276140088 https://github.com/pydata/xarray/issues/1084#issuecomment-276140088 https://api.github.com/repos/pydata/xarray/issues/1084 MDEyOklzc3VlQ29tbWVudDI3NjE0MDA4OA== jhamman 2443309 2017-01-30T18:03:07Z 2017-01-30T18:03:07Z MEMBER

@jreback - Unless I've totally missed something, PeriodIndex does not give us calendar flexibility. CF conventions allow for the use of a no-leap day calendar (for example) that we can't get with pandas/numpy datetime approach.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Towards a (temporary?) workaround for datetime issues at the xarray-level 187591179
275969458 https://github.com/pydata/xarray/issues/1084#issuecomment-275969458 https://api.github.com/repos/pydata/xarray/issues/1084 MDEyOklzc3VlQ29tbWVudDI3NTk2OTQ1OA== jreback 953992 2017-01-30T02:42:58Z 2017-01-30T02:42:58Z MEMBER

just my 2c here. You are going to end up writing a huge amount of code to re-implement essentially PeriodIndex. not really sure why you are going down this path.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Towards a (temporary?) workaround for datetime issues at the xarray-level 187591179
275963433 https://github.com/pydata/xarray/issues/1084#issuecomment-275963433 https://api.github.com/repos/pydata/xarray/issues/1084 MDEyOklzc3VlQ29tbWVudDI3NTk2MzQzMw== shoyer 1217238 2017-01-30T01:31:30Z 2017-01-30T01:31:30Z MEMBER

@spencerkclark I'm looking forward to the PR!

I understand that pandas wants a get_value method, but I would make this as minimal as possible, e.g., just: def get_value(self, series, key): return series.iloc[self.get_loc(key)] see here for discussion: https://github.com/pandas-dev/pandas/pull/8707/files#diff-3097b8b1cd77ca106d9f4237db9a516aR386

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Towards a (temporary?) workaround for datetime issues at the xarray-level 187591179
275962520 https://github.com/pydata/xarray/issues/1084#issuecomment-275962520 https://api.github.com/repos/pydata/xarray/issues/1084 MDEyOklzc3VlQ29tbWVudDI3NTk2MjUyMA== spencerkclark 6628425 2017-01-30T01:20:37Z 2017-01-30T01:20:37Z MEMBER

If you want to take this approach, name it something different and then use to parse inputs to get_slice_bound before calling the super class method. It would work the same, just with a slightly worse error message. But I'm not sure it's worth the trouble.

Sure thing -- for now I've left things as they are, but I'll take this advice if we decide otherwise.

One advantage of having our parser is that it would be pretty easy to extend as necessary, e.g., to handle negative years as specified by Wikipedia

I hadn't thought about that; I agree it would be nice to have that flexibility if needed.

Just to indicate some more progress here, I've updated the gist with @shoyer's date parser, which eliminates issues for years less than 100 (thanks again!), and some additional modifications needed to add support for use in Series and DataFrame objects.

I've started to work on writing some unit tests offline; if there is nothing glaring in the updated gist, I may post a work-in-progress PR in the next week or so, where we can discuss the finer details of the implementation of the NetCDFTimeIndex (and its tests), and how we might want to integrate it into xarray. Does that sound like a reasonable idea?

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Towards a (temporary?) workaround for datetime issues at the xarray-level 187591179
274379833 https://github.com/pydata/xarray/issues/1084#issuecomment-274379833 https://api.github.com/repos/pydata/xarray/issues/1084 MDEyOklzc3VlQ29tbWVudDI3NDM3OTgzMw== shoyer 1217238 2017-01-23T01:55:26Z 2017-01-23T01:55:26Z MEMBER

Would it be safer to name the implementation of _maybe_cast_slice_bound in NetCDFTimeIndex something slightly different and override the public get_slice_bound method (the only place _maybe_cast_slice_bound gets called in Index) of Index, instead? Or is that not worth the trouble?

If you want to take this approach, name it something different and then use to parse inputs to get_slice_bound before calling the super class method. It would work the same, just with a slightly worse error message. But I'm not sure it's worth the trouble.

One advantage of having our parser is that it would be pretty easy to extend as necessary, e.g., to handle negative years as specified by Wikipedia:

To represent years before 0000 or after 9999, the standard also permits the expansion of the year representation but only by prior agreement between the sender and the receiver.[11] An expanded year representation [±YYYYY] must have an agreed-upon number of extra year digits beyond the four-digit minimum, and it must be prefixed with a + or − sign[12] instead of the more common AD/BC (or BCE/CE) notation; by convention 1 BC is labelled +0000, 2 BC is labeled −0001, and so on.[13]

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Towards a (temporary?) workaround for datetime issues at the xarray-level 187591179
274379117 https://github.com/pydata/xarray/issues/1084#issuecomment-274379117 https://api.github.com/repos/pydata/xarray/issues/1084 MDEyOklzc3VlQ29tbWVudDI3NDM3OTExNw== spencerkclark 6628425 2017-01-23T01:48:09Z 2017-01-23T01:48:09Z MEMBER

@shoyer many thanks for the ISO-8601 date parser! That should be pretty straightforward to swap with the logic I use for date parsing in the gist, and will be much more robust.

One possible concern is that we are overwriting _maybe_cast_slice_bound, which is a private method, and it's not entirely clear what parts of pandas Index API (if any) are designed for subclassing.

Would it be safer to name the implementation of _maybe_cast_slice_bound in NetCDFTimeIndex something slightly different and override the public get_slice_bound method (the only place _maybe_cast_slice_bound gets called in Index) of Index, instead? Or is that not worth the trouble?

It might also be worth testing this index in a pandas DataFrame/Series. I don't expect things to work 100% but it might be enough to be useful.

Yes, this would be interesting to try. I'll test things out tomorrow and see how it goes.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Towards a (temporary?) workaround for datetime issues at the xarray-level 187591179
274374525 https://github.com/pydata/xarray/issues/1084#issuecomment-274374525 https://api.github.com/repos/pydata/xarray/issues/1084 MDEyOklzc3VlQ29tbWVudDI3NDM3NDUyNQ== shoyer 1217238 2017-01-23T00:51:39Z 2017-01-23T00:51:39Z MEMBER

As for the implementation in your gist, it looks pretty solid to me. One possible concern is that we are overwriting _maybe_cast_slice_bound, which is a private method, and it's not entirely clear what parts of pandas Index API (if any) are designed for subclassing. But this is certainly unlikely to break anytime soon (prior to pandas 2.0, which we will see coming a long ways off).

It might also be worth testing this index in a pandas DataFrame/Series. I don't expect things to work 100% but it might be enough to be useful.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Towards a (temporary?) workaround for datetime issues at the xarray-level 187591179
274372547 https://github.com/pydata/xarray/issues/1084#issuecomment-274372547 https://api.github.com/repos/pydata/xarray/issues/1084 MDEyOklzc3VlQ29tbWVudDI3NDM3MjU0Nw== shoyer 1217238 2017-01-23T00:21:49Z 2017-01-23T00:22:44Z MEMBER

I'm not particularly tied to one date parsing approach over another (here I was just mimicking pandas). In an ideal world, what would be your preference?

Honestly, it's probably more reliable to just parse ISO-8601 ourselves, which is intentionally pretty simple. That will give us complete control.

Here's something simple I wrote to parse ISO-8601 using a regex: ```python import re

def named(name, pattern): return '(?P<' + name + '>' + pattern + ')'

def optional(x): return '(?:' + x + ')?'

def trailing_optional(xs): if not xs: return '' return xs[0] + optional(trailing_optional(xs[1:]))

def build_pattern(date_sep='-', datetime_sep='T', time_sep='\:'): pieces = [(None, 'year', '\d{4}'), (date_sep, 'month', '\d{2}'), (date_sep, 'day', '\d{2}'), (datetime_sep, 'hour', '\d{2}'), (time_sep, 'minute', '\d{2}'), (time_sep, 'second', '\d{2}' + optional('.\d+'))] pattern_list = [] for sep, name, sub_pattern in pieces: pattern_list.append((sep if sep else '') + named(name, sub_pattern)) # TODO: allow timezone offsets? return '^' + trailing_optional(pattern_list) + '$'

def parse_iso8601(datetime_string): basic_pattern = build_pattern(date_sep='', time_sep='') extended_pattern = build_pattern() patterns = [basic_pattern, extended_pattern] for pattern in patterns: match = re.match(pattern, datetime_string) if match: return match.groupdict() raise ValueError('no ISO-8601 match for string: %s' % datetime_string)

test cases

print parse_iso8601('1999') print parse_iso8601('19990101') print parse_iso8601('1999-01-01') print parse_iso8601('1999-01-01T12') print parse_iso8601('1999-01-01T12:34') print parse_iso8601('1999-01-01T12:34:56.78') print parse_iso8601('19990101T123456.78') {'minute': None, 'year': '1999', 'second': None, 'month': None, 'hour': None, 'day': None} {'minute': None, 'year': '1999', 'second': None, 'month': '01', 'hour': None, 'day': '01'} {'minute': None, 'year': '1999', 'second': None, 'month': '01', 'hour': None, 'day': '01'} {'minute': None, 'year': '1999', 'second': None, 'month': '01', 'hour': '12', 'day': '01'} {'minute': '34', 'year': '1999', 'second': None, 'month': '01', 'hour': '12', 'day': '01'} {'minute': '34', 'year': '1999', 'second': '56.78', 'month': '01', 'hour': '12', 'day': '01'} {'minute': '34', 'year': '1999', 'second': '56.78', 'month': '01', 'hour': '12', 'day': '01'} ```

{
    "total_count": 1,
    "+1": 1,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Towards a (temporary?) workaround for datetime issues at the xarray-level 187591179
274367967 https://github.com/pydata/xarray/issues/1084#issuecomment-274367967 https://api.github.com/repos/pydata/xarray/issues/1084 MDEyOklzc3VlQ29tbWVudDI3NDM2Nzk2Nw== spencerkclark 6628425 2017-01-22T23:10:49Z 2017-01-22T23:12:14Z MEMBER

Thanks @shoyer. It appears using the yearfirst=True and dayfirst=False arguments fixes my first example: ```python In [9]: p._parse('0001-01-16', yearfirst=True, dayfirst=False) Out[9]: (_result(year=1, month=1, day=16), None)

In [10]: p._parse('0016-01-01', yearfirst=True, dayfirst=False) Out[10]: (_result(year=16, month=1, day=1), None) but not the second:python In [11]: p._parse('0100', yearfirst=True, dayfirst=False) Out[11]: (_result(year=100), None)

In [12]: p._parse('0001', yearfirst=True, dayfirst=False) Out[12]: (_result(day=1), None) So we might need to alter the logic a little bit if we continue with thedateutil``` approach.

I'm not super happy with using the private _parse method but I given that pandas uses it I suppose this is pretty safe. In the worse case scenario, it would be pretty easy to fork this logic from dateutil or a library with ISO-8601 parsing (e.g., Arrow).

I'm not particularly tied to one date parsing approach over another (here I was just mimicking pandas). In an ideal world, what would be your preference?

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Towards a (temporary?) workaround for datetime issues at the xarray-level 187591179
274364774 https://github.com/pydata/xarray/issues/1084#issuecomment-274364774 https://api.github.com/repos/pydata/xarray/issues/1084 MDEyOklzc3VlQ29tbWVudDI3NDM2NDc3NA== shoyer 1217238 2017-01-22T22:22:55Z 2017-01-22T22:22:55Z MEMBER

@spencerkclark It looks like dateutil's _parse method has dayfirst and yearfirst parameters to disambiguate such date. For netcdftime, I think we always want dayfirst=False, yearfirst=True (basically, use YYYY-MM-DD like ISO 8601).

I'm not super happy with using the private _parse method but I given that pandas uses it I suppose this is pretty safe. In the worse case scenario, it would be pretty easy to fork this logic from dateutil or a library with ISO-8601 parsing (e.g., Arrow).

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Towards a (temporary?) workaround for datetime issues at the xarray-level 187591179
274362263 https://github.com/pydata/xarray/issues/1084#issuecomment-274362263 https://api.github.com/repos/pydata/xarray/issues/1084 MDEyOklzc3VlQ29tbWVudDI3NDM2MjI2Mw== spencerkclark 6628425 2017-01-22T21:46:25Z 2017-01-22T21:46:25Z MEMBER

@shoyer @jhamman outside of an upstream issue in dateutil (what pandas uses to parse date strings), I think I have partial datetime string indexing with netcdftime objects functioning in my prototype (updated here). I've tried to simplify the logic that's in pandas wherever possible (ignoring timezones, business day / quarters logic etc.), while also modifying it such that it can handle any kind of datetime from netcdftime. It would be great if you could do another sanity check on it before I look to open up a PR.

If you think this is a path forward here, I suppose the next order of business would be for me to open up an issue in dateutil, and see what the developers there recommend. I flesh out the issue some in my notebook, but the gist of it is that for years less than 100, the default date parser behaves somewhat unpredictably.

For instance, I would expect line 5 to produce a date with year=1, and line 6 to produce a date with year=16: ```python In [1]: import dateutil

In [2]: dateutil.version Out[2]: '2.5.3'

In [3]: from dateutil import parser

In [4]: p = parser.parser()

In [5]: p._parse('0001-01-16') Out[5]: (_result(year=16, month=1, day=1), None)

In [6]: p._parse('0016-01-01') Out[6]: (_result(year=16, month=1, day=1), None) ```

and here I would want line 7 to return a result with year=1 (without a days value), as it does for years greater than or equal to 100 (line 8): ```python In [7]: p._parse('0001') Out[7]: (_result(day=1), None)

In [8]: p._parse('0100') Out[8]: (_result(year=100), None) ```

I recognize that I'm using the private API version of the parse function; however this is also what pandas does in its core (to enable picking off the resolution of the string specified). It has the additional use for me here of allowing me to convert the _result into whatever kind of datetime I'd like.

Thanks again for your help.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Towards a (temporary?) workaround for datetime issues at the xarray-level 187591179
270036251 https://github.com/pydata/xarray/issues/1084#issuecomment-270036251 https://api.github.com/repos/pydata/xarray/issues/1084 MDEyOklzc3VlQ29tbWVudDI3MDAzNjI1MQ== spencerkclark 6628425 2017-01-03T00:27:49Z 2017-01-03T00:27:49Z MEMBER

@shoyer that all sounds good. I'll get to work over the next few weeks and see what I can do. Thanks for your help.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Towards a (temporary?) workaround for datetime issues at the xarray-level 187591179
269878256 https://github.com/pydata/xarray/issues/1084#issuecomment-269878256 https://api.github.com/repos/pydata/xarray/issues/1084 MDEyOklzc3VlQ29tbWVudDI2OTg3ODI1Ng== shoyer 1217238 2016-12-31T19:09:19Z 2017-01-01T00:18:20Z MEMBER

@spencerkclark

partial datetime string indexing would be a must

The good news here is that almost all this logic lives in pandas's Python code and should be straightforward to duplicate. I can point you to the relevant locations if you're having trouble figuring this out.

An additional requirement would be serialization to netCDF files.

It should be quite straightforward to integrate this into xarray's existing serialization logic.

From a mechanics perspective, do we want this sort of index to live totally inside xarray or should it be in a separate package?

We could go either way here, but for now I think it makes sense to keep this in xarray proper.

Here's why:

  • We already have a bunch of CF-conventions specific logic related to serialization in xarray. Logically it makes sense to keep this together.
  • This will be a strict improvement over xarray's current handling of netcdf time objects.
  • I suspect most users who are interested combining netCDF times and pandas (e.g., if they want to a NetCDFTimeIndex in a DataFrame) are already using xarray, or at least willing to install it.
{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Towards a (temporary?) workaround for datetime issues at the xarray-level 187591179
269845642 https://github.com/pydata/xarray/issues/1084#issuecomment-269845642 https://api.github.com/repos/pydata/xarray/issues/1084 MDEyOklzc3VlQ29tbWVudDI2OTg0NTY0Mg== spencerkclark 6628425 2016-12-31T03:01:57Z 2016-12-31T03:01:57Z MEMBER

Thanks @jhamman, that's great to hear. At this moment I don't have any concrete things I'm waiting on from you.

I haven't had the chance to iterate more on this since last month, but as I've thought about it more, for the custom datetime index to really "feel" like the pandas DatetimeIndex, partial datetime string indexing would be a must. An additional requirement would be serialization to netCDF files. So from a features standpoint, I think those are the next two steps, but others are welcome to weigh in.

That being said, before pushing too far ahead, I think it's probably important to put things in a form that's more amenable to code review, testing, and collaboration. From a mechanics perspective, do we want this sort of index to live totally inside xarray or should it be in a separate package? Depending on which, at some point in the next month I could either put together a PR here or set up a new repo that would probably get things started with just the field accessors (to enable groupby operations) and then we could work from there. What are folks' thoughts on this?

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Towards a (temporary?) workaround for datetime issues at the xarray-level 187591179
269584644 https://github.com/pydata/xarray/issues/1084#issuecomment-269584644 https://api.github.com/repos/pydata/xarray/issues/1084 MDEyOklzc3VlQ29tbWVudDI2OTU4NDY0NA== jhamman 2443309 2016-12-29T05:43:19Z 2016-12-29T18:13:35Z MEMBER

@shoyer, @jhamman , @spencerkclark, @spencerahill , and @darothen -

I think I've got netcdftime sufficiently detatched from NetCDF4-Python that I think we can keep moving forward. Along the lines of what you're notebook showed last month, you should just be able to swap out from netCDF4 import netcdftime with import netcdftime. There's plenty to do with netcdftime before we yank it out of netCDF4-Python but that shouldn't keep you from progressing in any way. Are you waiting on anything from me? What are the next steps here?

{
    "total_count": 2,
    "+1": 2,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Towards a (temporary?) workaround for datetime issues at the xarray-level 187591179
265174968 https://github.com/pydata/xarray/issues/1084#issuecomment-265174968 https://api.github.com/repos/pydata/xarray/issues/1084 MDEyOklzc3VlQ29tbWVudDI2NTE3NDk2OA== spencerkclark 6628425 2016-12-06T15:13:41Z 2016-12-06T15:13:41Z MEMBER

@shoyer brings up a good point regarding partial datetime string indexing. For instance in my basic example, indexing with truncated string dates of the form '2000-01-01' (versus the full specification, 2000-01-01 00:00:00') works because netcdftime._parse_date simply assumes that you meant '2000-01-01 00:00:00' when you wrote '2000-01-01'.

This would mean that the same string specification could have different behavior for DatetimeIndex objects versus NetCDFTimeIndex objects, which is probably not desirable.

For instance, using the current setup in my basic example with sub-daily resolution data, selecting a time using '2000-01-01' would give you just the value associated with '2000-01-01 00:00:00': ``` In [20] dates = [netcdftime.DatetimeAllLeap(2000, 1, 1, 0), netcdftime.DatetimeAllLeap(2000, 1, 1, 3)] In [21] da = xr.DataArray(np.arange(2), coords=[NetCDFTimeIndex(dates)], dims=['time']) In [22] da.sel(time='2000-01-01')

Out [22] <xarray.DataArray ()> array(0) Coordinates: time object 2000-01-01 00:00:00 ```

but using a DatetimeIndex this would give you both values (because of partial datetime string selection):

``` In [23] from datetime import datetime In [24] dates = [datetime(2000, 1, 1, 0), datetime(2000, 1, 1, 3)] In [25] da = xr.DataArray(np.arange(2), coords=[dates], dims=['time']) In [26] da.sel(time='2000-01-01')

Out [26] <xarray.DataArray (time: 2)> array([0, 1]) Coordinates: * time (time) datetime64[ns] 2000-01-01 2000-01-01T03:00:00 ```

I think if we were to include string-based indexing, it would be best if it were completely consistent with the DatetimeIndex version. I would love to be wrong, but I don't see a clean way of directly using existing code from pandas to enable this. At least in my (possibly naive) reading of the internals of DatetimeIndex, the functions associated with partial datetime string selection are somewhat tied to using datetimes with standard calendars (somewhat in the weeds, but more specifically I'm looking at pandas.tslib.parse_datetime_string_with_reso and pandas.tseries.index.DatetimeIndex._parsed_string_to_bounds), and it could take a fair bit of adapting that code for our purposes to unhitch that dependence. Is that a fair assessment?

So ultimately this raises the question, would we want to add just the field accessors to enable group-by operations for now and add string-based selection (and other features like resample) later, or should we put our heads down and work out a solution for partial datetime string based using netcdftime datetime objects?

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Towards a (temporary?) workaround for datetime issues at the xarray-level 187591179
265113215 https://github.com/pydata/xarray/issues/1084#issuecomment-265113215 https://api.github.com/repos/pydata/xarray/issues/1084 MDEyOklzc3VlQ29tbWVudDI2NTExMzIxNQ== shoyer 1217238 2016-12-06T10:20:58Z 2016-12-06T10:20:58Z MEMBER

@spencerahill Those are absolutely not essential -- I think your basic example is probably already enough to be useful.

Perhaps the most complete list of time series functionality in xarray can be found on this doc page: http://xarray.pydata.org/en/stable/time-series.html

{
    "total_count": 1,
    "+1": 1,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Towards a (temporary?) workaround for datetime issues at the xarray-level 187591179
264884735 https://github.com/pydata/xarray/issues/1084#issuecomment-264884735 https://api.github.com/repos/pydata/xarray/issues/1084 MDEyOklzc3VlQ29tbWVudDI2NDg4NDczNQ== shoyer 1217238 2016-12-05T15:32:30Z 2016-12-05T15:32:30Z MEMBER

@spencerkclark This looks pretty sane to me, though of course it's still missing a few nice things you can do with datetime64 (e.g., reindex and partial datetime string selection).

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Towards a (temporary?) workaround for datetime issues at the xarray-level 187591179
264742277 https://github.com/pydata/xarray/issues/1084#issuecomment-264742277 https://api.github.com/repos/pydata/xarray/issues/1084 MDEyOklzc3VlQ29tbWVudDI2NDc0MjI3Nw== spencerkclark 6628425 2016-12-04T23:58:12Z 2016-12-04T23:58:12Z MEMBER

@shoyer @jhamman this is my first time digging in to the internals of pandas / xarray, so please forgive me if this is off-track, but I've started experimenting with a subclass of pandas.Index (which I've called NetCDFTimeIndex) that could be a start toward addressing this issue. Here's a link to a Jupyter notebook that sketches it out, and includes a few examples of its use.

As a proof of concept, it currently enables groupby operations through a crude version of the _field_accessor / get_date_field approach implemented in pandas.DatetimeIndex (which in pandas seems to leverage some cython). Any datetime object implemented in netcdftime can be used (so this supports a number of custom calendars out of the box). By overriding get_loc and slice_locs and leveraging an internal function of netcdftime, I've also experimented with enabling ISO 8601 string-based selection of times / time slices.

I have yet to look into serialization or how / if this could be integrated into xarray, but does this seem like an approach worth pursuing to address this issue?

cc @spencerahill

{
    "total_count": 1,
    "+1": 1,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Towards a (temporary?) workaround for datetime issues at the xarray-level 187591179
258713425 https://github.com/pydata/xarray/issues/1084#issuecomment-258713425 https://api.github.com/repos/pydata/xarray/issues/1084 MDEyOklzc3VlQ29tbWVudDI1ODcxMzQyNQ== shoyer 1217238 2016-11-06T21:48:30Z 2016-11-06T21:48:30Z MEMBER

To be clear, the subclassing question is whether we should subclass Index or not. This would entail overriding a lot of methods, and would probably break in the future (with pandas 2.0). I definitely do not recommend subclassing PeriodIndex -- that could be very fragile.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Towards a (temporary?) workaround for datetime issues at the xarray-level 187591179

Advanced export

JSON shape: default, array, newline-delimited, object

CSV options:

CREATE TABLE [issue_comments] (
   [html_url] TEXT,
   [issue_url] TEXT,
   [id] INTEGER PRIMARY KEY,
   [node_id] TEXT,
   [user] INTEGER REFERENCES [users]([id]),
   [created_at] TEXT,
   [updated_at] TEXT,
   [author_association] TEXT,
   [body] TEXT,
   [reactions] TEXT,
   [performed_via_github_app] TEXT,
   [issue] INTEGER REFERENCES [issues]([id])
);
CREATE INDEX [idx_issue_comments_issue]
    ON [issue_comments] ([issue]);
CREATE INDEX [idx_issue_comments_user]
    ON [issue_comments] ([user]);
Powered by Datasette · Queries took 15.906ms · About: xarray-datasette