html_url,issue_url,id,node_id,user,created_at,updated_at,author_association,body,reactions,performed_via_github_app,issue
https://github.com/pydata/xarray/issues/1084#issuecomment-276695798,https://api.github.com/repos/pydata/xarray/issues/1084,276695798,MDEyOklzc3VlQ29tbWVudDI3NjY5NTc5OA==,6628425,2017-02-01T15:57:29Z,2017-02-01T15:57:29Z,MEMBER,"@shoyer @jhamman I will defer to both of you on this issue. In light of recent discussion, what do you think is the preferred approach? It seems like three avenues have been discussed (here and in pandas-dev/pandas#15258):
1. Subclass pandas.Index (as I've started on above)
2. Subclass pandas.DatetimeIndex
3. Create custom calendars for use in PeriodIndex (do we care about the semantic difference between ""Periods"" and ""Datetimes""?)
It's not immediately obvious to me that avenues 2 and 3 would be clean and straightforward, but if there is a way to *easily* adapt them for custom calendars — even unusual ones like ""all-leap"" (29-day Februaries every year) and ""360-day"" (30-day Februaries every year) calendars, which cannot be fully represented by ordinary ```datetime.datetime``` objects — I'd certainly be open to looking into it more (surely the less code we would need to add / test the better).
Are we back to the drawing board, or should we continue along the path of avenue 1?","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,187591179
https://github.com/pydata/xarray/issues/1084#issuecomment-275962520,https://api.github.com/repos/pydata/xarray/issues/1084,275962520,MDEyOklzc3VlQ29tbWVudDI3NTk2MjUyMA==,6628425,2017-01-30T01:20:37Z,2017-01-30T01:20:37Z,MEMBER,"> If you want to take this approach, name it something different and then use to parse inputs to get_slice_bound before calling the super class method. It would work the same, just with a slightly worse error message. But I'm not sure it's worth the trouble.
Sure thing -- for now I've left things as they are, but I'll take this advice if we decide otherwise.
> One advantage of having our parser is that it would be pretty easy to extend as necessary, e.g., to handle negative years as specified by Wikipedia
I hadn't thought about that; I agree it would be nice to have that flexibility if needed.
Just to indicate some more progress here, I've updated [the gist](https://gist.github.com/spencerkclark/c33eaf241e52f2fb6ab877c64961de9d) with @shoyer's date parser, which eliminates issues for years less than 100 (thanks again!), and some additional modifications needed to add support for use in Series and DataFrame objects.
I've started to work on writing some unit tests offline; if there is nothing glaring in the updated gist, I may post a work-in-progress PR in the next week or so, where we can discuss the finer details of the implementation of the NetCDFTimeIndex (and its tests), and how we might want to integrate it into xarray. Does that sound like a reasonable idea?","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,187591179
https://github.com/pydata/xarray/issues/1084#issuecomment-274379117,https://api.github.com/repos/pydata/xarray/issues/1084,274379117,MDEyOklzc3VlQ29tbWVudDI3NDM3OTExNw==,6628425,2017-01-23T01:48:09Z,2017-01-23T01:48:09Z,MEMBER,"@shoyer many thanks for the ISO-8601 date parser! That should be pretty straightforward to swap with the logic I use for date parsing in the gist, and will be much more robust.
> One possible concern is that we are overwriting _maybe_cast_slice_bound, which is a private method, and it's not entirely clear what parts of pandas Index API (if any) are designed for subclassing.
Would it be safer to name the implementation of ```_maybe_cast_slice_bound``` in ```NetCDFTimeIndex``` something slightly different and override the public ```get_slice_bound``` method (the only place ```_maybe_cast_slice_bound``` gets called in Index) of Index, instead? Or is that not worth the trouble?
> It might also be worth testing this index in a pandas DataFrame/Series. I don't expect things to work 100% but it might be enough to be useful.
Yes, this would be interesting to try. I'll test things out tomorrow and see how it goes.","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,187591179
https://github.com/pydata/xarray/issues/1084#issuecomment-274367967,https://api.github.com/repos/pydata/xarray/issues/1084,274367967,MDEyOklzc3VlQ29tbWVudDI3NDM2Nzk2Nw==,6628425,2017-01-22T23:10:49Z,2017-01-22T23:12:14Z,MEMBER,"Thanks @shoyer. It appears using the ```yearfirst=True``` and ```dayfirst=False``` arguments fixes my first example:
```python
In [9]: p._parse('0001-01-16', yearfirst=True, dayfirst=False)
Out[9]: (_result(year=1, month=1, day=16), None)
In [10]: p._parse('0016-01-01', yearfirst=True, dayfirst=False)
Out[10]: (_result(year=16, month=1, day=1), None)
```
but not the second:
```python
In [11]: p._parse('0100', yearfirst=True, dayfirst=False)
Out[11]: (_result(year=100), None)
In [12]: p._parse('0001', yearfirst=True, dayfirst=False)
Out[12]: (_result(day=1), None)
```
So we might need to alter the logic a little bit if we continue with the ```dateutil``` approach.
>I'm not super happy with using the private _parse method but I given that pandas uses it I suppose this is pretty safe. In the worse case scenario, it would be pretty easy to fork this logic from dateutil or a library with ISO-8601 parsing (e.g., Arrow).
I'm not particularly tied to one date parsing approach over another (here I was just mimicking pandas). In an ideal world, what would be your preference? ","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,187591179
https://github.com/pydata/xarray/issues/1084#issuecomment-274362263,https://api.github.com/repos/pydata/xarray/issues/1084,274362263,MDEyOklzc3VlQ29tbWVudDI3NDM2MjI2Mw==,6628425,2017-01-22T21:46:25Z,2017-01-22T21:46:25Z,MEMBER,"@shoyer @jhamman outside of an upstream issue in ```dateutil``` (what pandas uses to parse date strings), I think I have partial datetime string indexing with ```netcdftime``` objects functioning in my prototype ([updated here](https://gist.github.com/spencerkclark/c33eaf241e52f2fb6ab877c64961de9d)). I've tried to simplify the logic that's in pandas wherever possible (ignoring timezones, business day / quarters logic etc.), while also modifying it such that it can handle any kind of datetime from ```netcdftime```. It would be great if you could do another sanity check on it before I look to open up a PR.
If you think this is a path forward here, I suppose the next order of business would be for me to open up an issue in ```dateutil```, and see what the developers there recommend. I flesh out the issue some in my notebook, but the gist of it is that for years less than 100, the default date parser behaves somewhat unpredictably.
For instance, I would expect line 5 to produce a date with ```year=1```, and line 6 to produce a date with ```year=16```:
```python
In [1]: import dateutil
In [2]: dateutil.__version__
Out[2]: '2.5.3'
In [3]: from dateutil import parser
In [4]: p = parser.parser()
In [5]: p._parse('0001-01-16')
Out[5]: (_result(year=16, month=1, day=1), None)
In [6]: p._parse('0016-01-01')
Out[6]: (_result(year=16, month=1, day=1), None)
```
and here I would want line 7 to return a result with ```year=1``` (without a days value), as it does for years greater than or equal to 100 (line 8):
```python
In [7]: p._parse('0001')
Out[7]: (_result(day=1), None)
In [8]: p._parse('0100')
Out[8]: (_result(year=100), None)
```
I recognize that I'm using the private API version of the parse function; however this is also [what pandas does in its core](https://github.com/pandas-dev/pandas/blob/master/pandas/tslib.pyx#L2060) (to enable picking off the resolution of the string specified). It has the additional use for me here of allowing me to convert the ```_result``` into whatever kind of datetime I'd like.
Thanks again for your help.","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,187591179
https://github.com/pydata/xarray/issues/1084#issuecomment-270036251,https://api.github.com/repos/pydata/xarray/issues/1084,270036251,MDEyOklzc3VlQ29tbWVudDI3MDAzNjI1MQ==,6628425,2017-01-03T00:27:49Z,2017-01-03T00:27:49Z,MEMBER,@shoyer that all sounds good. I'll get to work over the next few weeks and see what I can do. Thanks for your help.,"{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,187591179
https://github.com/pydata/xarray/issues/1084#issuecomment-269845642,https://api.github.com/repos/pydata/xarray/issues/1084,269845642,MDEyOklzc3VlQ29tbWVudDI2OTg0NTY0Mg==,6628425,2016-12-31T03:01:57Z,2016-12-31T03:01:57Z,MEMBER,"Thanks @jhamman, that's great to hear. At this moment I don't have any concrete things I'm waiting on from you.
I haven't had the chance to iterate more on this since last month, but as I've thought about it more, for the custom datetime index to really ""feel"" like the pandas ```DatetimeIndex```, partial datetime string indexing would be a must. An additional requirement would be serialization to netCDF files. So from a features standpoint, I think those are the next two steps, but others are welcome to weigh in.
That being said, before pushing too far ahead, I think it's probably important to put things in a form that's more amenable to code review, testing, and collaboration. From a mechanics perspective, do we want this sort of index to live totally inside ```xarray``` or should it be in a separate package? Depending on which, at some point in the next month I could either put together a PR here or set up a new repo that would probably get things started with just the field accessors (to enable groupby operations) and then we could work from there. What are folks' thoughts on this?","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,187591179
https://github.com/pydata/xarray/issues/1084#issuecomment-265174968,https://api.github.com/repos/pydata/xarray/issues/1084,265174968,MDEyOklzc3VlQ29tbWVudDI2NTE3NDk2OA==,6628425,2016-12-06T15:13:41Z,2016-12-06T15:13:41Z,MEMBER,"@shoyer brings up a good point regarding partial datetime string indexing. For instance in my basic example, indexing with truncated string dates of the form ```'2000-01-01'``` (versus the full specification, ```2000-01-01 00:00:00'```) works because [```netcdftime._parse_date```](https://github.com/Unidata/netcdf4-python/blob/master/netcdftime/_netcdftime.pyx#L927) simply assumes that you meant ```'2000-01-01 00:00:00'``` when you wrote ```'2000-01-01'```.
This would mean that the same string specification could have different behavior for ```DatetimeIndex``` objects versus ```NetCDFTimeIndex``` objects, which is probably not desirable.
For instance, using the current setup in my basic example with sub-daily resolution data, selecting a time using ```'2000-01-01'``` would give you just the value associated with ```'2000-01-01 00:00:00'```:
```
In [20] dates = [netcdftime.DatetimeAllLeap(2000, 1, 1, 0), netcdftime.DatetimeAllLeap(2000, 1, 1, 3)]
In [21] da = xr.DataArray(np.arange(2), coords=[NetCDFTimeIndex(dates)], dims=['time'])
In [22] da.sel(time='2000-01-01')
Out [22]
array(0)
Coordinates:
time object 2000-01-01 00:00:00
```
but using a ```DatetimeIndex``` this would give you both values (because of partial datetime string selection):
```
In [23] from datetime import datetime
In [24] dates = [datetime(2000, 1, 1, 0), datetime(2000, 1, 1, 3)]
In [25] da = xr.DataArray(np.arange(2), coords=[dates], dims=['time'])
In [26] da.sel(time='2000-01-01')
Out [26]
array([0, 1])
Coordinates:
* time (time) datetime64[ns] 2000-01-01 2000-01-01T03:00:00
```
I think if we were to include string-based indexing, it would be best if it were completely consistent with the ```DatetimeIndex``` version. I would love to be wrong, but I don't see a clean way of *directly* using existing code from pandas to enable this. At least in my (possibly naive) reading of the internals of ```DatetimeIndex```, the functions associated with partial datetime string selection are somewhat tied to using datetimes with standard calendars (somewhat in the weeds, but more specifically I'm looking at [```pandas.tslib.parse_datetime_string_with_reso```](https://github.com/pandas-dev/pandas/blob/master/pandas/tslib.pyx#L1984) and [```pandas.tseries.index.DatetimeIndex._parsed_string_to_bounds```](https://github.com/pandas-dev/pandas/blob/master/pandas/tseries/index.py#L1229)), and it could take a fair bit of adapting that code for our purposes to unhitch that dependence. Is that a fair assessment?
So ultimately this raises the question, would we want to add just the field accessors to enable group-by operations for now and add string-based selection (and other features like ```resample```) later, or should we put our heads down and work out a solution for partial datetime string based using netcdftime datetime objects?","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,187591179
https://github.com/pydata/xarray/issues/1084#issuecomment-264742277,https://api.github.com/repos/pydata/xarray/issues/1084,264742277,MDEyOklzc3VlQ29tbWVudDI2NDc0MjI3Nw==,6628425,2016-12-04T23:58:12Z,2016-12-04T23:58:12Z,MEMBER,"@shoyer @jhamman this is my first time digging in to the internals of pandas / xarray, so please forgive me if this is off-track, but I've started experimenting with a subclass of ```pandas.Index``` (which I've called ```NetCDFTimeIndex```) that could be a start toward addressing this issue. Here's a [link to a Jupyter notebook that sketches it out](https://nbviewer.jupyter.org/gist/spencerkclark/158ddbd0b0df7de1332020429e4c97c2), and includes a few examples of its use.
As a proof of concept, it currently enables groupby operations through a crude version of the [```_field_accessor```](https://github.com/pandas-dev/pandas/blob/master/pandas/tseries/index.py#L63) / [```get_date_field```](https://github.com/pandas-dev/pandas/blob/master/pandas/tslib.pyx#L4484) approach implemented in ```pandas.DatetimeIndex``` (which in pandas seems to leverage some cython). Any datetime object implemented in ```netcdftime``` can be used (so this supports a number of custom calendars out of the box). By overriding ```get_loc``` and ```slice_locs``` and leveraging an [internal function](https://github.com/Unidata/netcdf4-python/blob/master/netcdftime/_netcdftime.pyx#L927) of ```netcdftime```, I've also experimented with enabling ISO 8601 string-based selection of times / time slices.
I have yet to look into serialization or how / if this could be integrated into xarray, but does this seem like an approach worth pursuing to address this issue?
cc @spencerahill","{""total_count"": 1, ""+1"": 1, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,187591179