github: issue_comments: 39 rows where issue = 387924616 sorted by updated

39 rows where issue = 387924616 sorted by updated_at descending

Search:

descending

id	html_url	issue_url	node_id	user	created_at	updated_at ▲	author_association	body	reactions	issue
460019486	https://github.com/pydata/xarray/pull/2593#issuecomment-460019486	https://api.github.com/repos/pydata/xarray/issues/2593	MDEyOklzc3VlQ29tbWVudDQ2MDAxOTQ4Ng==	shoyer 1217238	2019-02-03T03:21:29Z	2019-02-03T03:21:29Z	MEMBER	thanks @jwenfai and @spencerkclark !	{ "total_count": 2, "+1": 1, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 1, "eyes": 0 }	CFTimeIndex Resampling 387924616
460009606	https://github.com/pydata/xarray/pull/2593#issuecomment-460009606	https://api.github.com/repos/pydata/xarray/issues/2593	MDEyOklzc3VlQ29tbWVudDQ2MDAwOTYwNg==	jwenfai 8708062	2019-02-02T23:48:09Z	2019-02-02T23:48:09Z	CONTRIBUTOR	All tests passed. Thanks, @spencerkclark and @shoyer, for all the help!	{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	CFTimeIndex Resampling 387924616
459998025	https://github.com/pydata/xarray/pull/2593#issuecomment-459998025	https://api.github.com/repos/pydata/xarray/issues/2593	MDEyOklzc3VlQ29tbWVudDQ1OTk5ODAyNQ==	spencerkclark 6628425	2019-02-02T20:49:30Z	2019-02-02T20:49:30Z	MEMBER	Sounds good @shoyer, thanks for bringing this to the finish line.	{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	CFTimeIndex Resampling 387924616
459997853	https://github.com/pydata/xarray/pull/2593#issuecomment-459997853	https://api.github.com/repos/pydata/xarray/issues/2593	MDEyOklzc3VlQ29tbWVudDQ1OTk5Nzg1Mw==	shoyer 1217238	2019-02-02T20:46:45Z	2019-02-02T20:46:45Z	MEMBER	I'm going to merge this when tests pass	{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	CFTimeIndex Resampling 387924616
444627776	https://github.com/pydata/xarray/pull/2593#issuecomment-444627776	https://api.github.com/repos/pydata/xarray/issues/2593	MDEyOklzc3VlQ29tbWVudDQ0NDYyNzc3Ng==	pep8speaks 24736507	2018-12-05T20:08:13Z	2019-02-02T20:46:26Z	NONE	Hello @jwenfai! Thanks for updating the PR. Cheers ! There are no PEP8 issues in this Pull Request. :beers: Comment last updated on February 02, 2019 at 20:46 Hours UTC	{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	CFTimeIndex Resampling 387924616
459859650	https://github.com/pydata/xarray/pull/2593#issuecomment-459859650	https://api.github.com/repos/pydata/xarray/issues/2593	MDEyOklzc3VlQ29tbWVudDQ1OTg1OTY1MA==	shoyer 1217238	2019-02-01T20:36:53Z	2019-02-01T20:36:53Z	MEMBER	It looks like tests are passing now. I'm going to give this another look over and then (probably) merge	{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	CFTimeIndex Resampling 387924616
458608075	https://github.com/pydata/xarray/pull/2593#issuecomment-458608075	https://api.github.com/repos/pydata/xarray/issues/2593	MDEyOklzc3VlQ29tbWVudDQ1ODYwODA3NQ==	fmaussion 10050469	2019-01-29T16:30:03Z	2019-01-29T16:30:03Z	MEMBER	I can't diagnose what's wrong from the error message (something to do with conda it seems) Some connection error. I restarted Travis, let's see if this happens again.	{ "total_count": 1, "+1": 1, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	CFTimeIndex Resampling 387924616
457963032	https://github.com/pydata/xarray/pull/2593#issuecomment-457963032	https://api.github.com/repos/pydata/xarray/issues/2593	MDEyOklzc3VlQ29tbWVudDQ1Nzk2MzAzMg==	jwenfai 8708062	2019-01-27T23:08:06Z	2019-01-27T23:08:06Z	CONTRIBUTOR	I just merged a fix for the test failures with pandas 0.24. If you merge in master that should fix your issues here too. Fix works, all tests are passing now. Thanks!	{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	CFTimeIndex Resampling 387924616
457954188	https://github.com/pydata/xarray/pull/2593#issuecomment-457954188	https://api.github.com/repos/pydata/xarray/issues/2593	MDEyOklzc3VlQ29tbWVudDQ1Nzk1NDE4OA==	shoyer 1217238	2019-01-27T21:05:40Z	2019-01-27T21:05:40Z	MEMBER	I just merged a fix for the test failures with pandas 0.24. If you merge in master that should fix your issues here too.	{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	CFTimeIndex Resampling 387924616
456581862	https://github.com/pydata/xarray/pull/2593#issuecomment-456581862	https://api.github.com/repos/pydata/xarray/issues/2593	MDEyOklzc3VlQ29tbWVudDQ1NjU4MTg2Mg==	jwenfai 8708062	2019-01-22T22:07:24Z	2019-01-22T22:07:24Z	CONTRIBUTOR	The pandas-dev build job is still failing but everything else is passing.	{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	CFTimeIndex Resampling 387924616
456513616	https://github.com/pydata/xarray/pull/2593#issuecomment-456513616	https://api.github.com/repos/pydata/xarray/issues/2593	MDEyOklzc3VlQ29tbWVudDQ1NjUxMzYxNg==	spencerkclark 6628425	2019-01-22T18:39:37Z	2019-01-22T18:39:37Z	MEMBER	One more optional thing -- support for the `loffset` keyword was recently added for DatetimeIndex resampling (https://github.com/pydata/xarray/pull/2608). This just involves a simple adjustment to the bin labels (the index of the `first_items` Series). You could implement that here, or for now just raise a `NotImplementedError` for resampling with a CFTimeIndex if `loffset` is not `None`.	{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	CFTimeIndex Resampling 387924616
456176524	https://github.com/pydata/xarray/pull/2593#issuecomment-456176524	https://api.github.com/repos/pydata/xarray/issues/2593	MDEyOklzc3VlQ29tbWVudDQ1NjE3NjUyNA==	jwenfai 8708062	2019-01-21T19:22:38Z	2019-01-21T19:22:38Z	CONTRIBUTOR	I made the changes. Did not preview the .rst files but they should render correctly. For the non-standard calendars, I can't think of any way to test them except to compare them against output from pandas. I think it's fine as long as the original and resampled indices do not go over 28 days.	{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	CFTimeIndex Resampling 387924616
455907774	https://github.com/pydata/xarray/pull/2593#issuecomment-455907774	https://api.github.com/repos/pydata/xarray/issues/2593	MDEyOklzc3VlQ29tbWVudDQ1NTkwNzc3NA==	jwenfai 8708062	2019-01-20T22:17:51Z	2019-01-20T22:17:51Z	CONTRIBUTOR	Thanks for showing me how the code could be improved @spencerkclark. I've pushed the changes.	{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	CFTimeIndex Resampling 387924616
455816654	https://github.com/pydata/xarray/pull/2593#issuecomment-455816654	https://api.github.com/repos/pydata/xarray/issues/2593	MDEyOklzc3VlQ29tbWVudDQ1NTgxNjY1NA==	jwenfai 8708062	2019-01-19T21:28:10Z	2019-01-19T21:28:10Z	CONTRIBUTOR	I've looked at the failed tests on `CONDA_ENV=py36-pandas-dev` and made necessary fixes on my end so that all tests pass. I am not sure if that conda env is using cftime 1.0.3.4 but it needs to be if the tests are to pass. Also, needless `print` statements from my last commit have been removed. Failures seem to be due to cftime 1.0.0, tests pass on cftime 1.0.3.4 xarray/tests/test_cftime_offsets.py::test_dayofweek_after_cftime_range[A] FAILED [ 33%] xarray/tests/test_cftime_offsets.py::test_dayofweek_after_cftime_range[M] FAILED [ 33%] xarray/tests/test_cftime_offsets.py::test_dayofweek_after_cftime_range[D] FAILED [ 33%] xarray/tests/test_cftime_offsets.py::test_dayofyear_after_cftime_range[A] FAILED [ 33%] xarray/tests/test_cftime_offsets.py::test_dayofyear_after_cftime_range[M] FAILED [ 33%] xarray/tests/test_cftime_offsets.py::test_dayofyear_after_cftime_range[D] FAILED [ 33%] Failure due to 3 reasons: (1) I did not set defaults for `closed` and `label`, (2) year values used in test are too low, making it possible for year 0 to appear when resampling, which is invalid for `julian`, `gregorian`, and `proleptic_gregorian` calendars, and (3) `NotImplementedError` being raised due to the assumption that cftime resampling hasn't been implemented. xarray/tests/test_cftimeindex.py::test_resample_error[365_day] FAILED [ 37%] xarray/tests/test_cftimeindex.py::test_resample_error[360_day] FAILED [ 37%] xarray/tests/test_cftimeindex.py::test_resample_error[julian] FAILED [ 37%] xarray/tests/test_cftimeindex.py::test_resample_error[all_leap] FAILED [ 37%] xarray/tests/test_cftimeindex.py::test_resample_error[366_day] FAILED [ 37%] xarray/tests/test_cftimeindex.py::test_resample_error[gregorian] FAILED [ 37%] xarray/tests/test_cftimeindex.py::test_resample_error[proleptic_gregorian] FAILED [ 37%] Failure seems to be due to cftime 1.0.0, tests pass on cftime 1.0.3.4 `xarray/tests/test_coding_times.py::test_cf_timedelta[timedeltas7-days-nan] FAILED [ 53%]` Since cftime resampling is implemented, there's no need to raise `NotImplementedError` `xarray/tests/test_dataarray.py::TestDataArray::test_resample_cftimeindex FAILED [ 58%]`	{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	CFTimeIndex Resampling 387924616
455807650	https://github.com/pydata/xarray/pull/2593#issuecomment-455807650	https://api.github.com/repos/pydata/xarray/issues/2593	MDEyOklzc3VlQ29tbWVudDQ1NTgwNzY1MA==	jwenfai 8708062	2019-01-19T19:20:28Z	2019-01-19T19:20:28Z	CONTRIBUTOR	Just saw your comment. I've actually resolved the merge conflicts on my end and haven't pushed them yet. Checks for `index[0] < datetime_bins[0]` and `index[lenidx - 1] > datetime_bins[lenbin - 1]` have also been added. The `test_resampler` runtime has been reduced to 1 minute and it seems like it has decent coverage since `ValueError`s are still being raised. Wrote another test for `_get_range_edges` that might help cut down the number of cases that needs to be tested with the slower `test_resampler` function. Going to push my changes and see what CI says.	{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	CFTimeIndex Resampling 387924616
455790039	https://github.com/pydata/xarray/pull/2593#issuecomment-455790039	https://api.github.com/repos/pydata/xarray/issues/2593	MDEyOklzc3VlQ29tbWVudDQ1NTc5MDAzOQ==	spencerkclark 6628425	2019-01-19T15:33:43Z	2019-01-19T15:38:08Z	MEMBER	5808 of 5920 tests now pass and the remaining 112 are ignored due to ValueError: "value falls before first bin". That's great news! I think writing targeted unit tests are the last thing on the agenda, so I'll get right on that. Before spending too much time on that just yet, see if you can resolve the merge conflicts, and if you can think about a way to reduce the length of the existing tests. It would be helpful to see a coverage report generated by coveralls for the new logic you've added (if you resolve the merge conflicts our CI here will run and we'll be able to see that automatically). Maybe start by commenting out a bunch of the really long tests and see where things stand? Then we can think about how to add coverage back in as needed.	{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	CFTimeIndex Resampling 387924616
455788936	https://github.com/pydata/xarray/pull/2593#issuecomment-455788936	https://api.github.com/repos/pydata/xarray/issues/2593	MDEyOklzc3VlQ29tbWVudDQ1NTc4ODkzNg==	jwenfai 8708062	2019-01-19T15:19:01Z	2019-01-19T15:19:01Z	CONTRIBUTOR	Ignore my last comment, I made a really silly mistake. I assumed it was `a - b` instead of `b - a` for the exact cftime function. 5808 of 5920 tests now pass and the remaining 112 are ignored due to ValueError: "value falls before first bin". I think writing targeted unit tests are the last thing on the agenda, so I'll get right on that.	{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	CFTimeIndex Resampling 387924616
455734854	https://github.com/pydata/xarray/pull/2593#issuecomment-455734854	https://api.github.com/repos/pydata/xarray/issues/2593	MDEyOklzc3VlQ29tbWVudDQ1NTczNDg1NA==	jwenfai 8708062	2019-01-19T01:11:59Z	2019-01-19T01:12:08Z	CONTRIBUTOR	Thanks for raising the issue with the cftime devs. I've tested the function and it produces exact datetime values but there's still a problem with extra bins and nan location mismatch. I'll report back once I have a clearer idea of what's going on.	{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	CFTimeIndex Resampling 387924616
455662721	https://github.com/pydata/xarray/pull/2593#issuecomment-455662721	https://api.github.com/repos/pydata/xarray/issues/2593	MDEyOklzc3VlQ29tbWVudDQ1NTY2MjcyMQ==	spencerkclark 6628425	2019-01-18T19:35:14Z	2019-01-18T19:35:14Z	MEMBER	@jwenfai I created an issue in cftime regarding this question: Unidata/cftime#109.	{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	CFTimeIndex Resampling 387924616
455619034	https://github.com/pydata/xarray/pull/2593#issuecomment-455619034	https://api.github.com/repos/pydata/xarray/issues/2593	MDEyOklzc3VlQ29tbWVudDQ1NTYxOTAzNA==	spencerkclark 6628425	2019-01-18T17:09:26Z	2019-01-18T18:23:19Z	MEMBER	The first and last values are returned by _adjust_bin_anchored when isinstance(offset, CFTIME_TICKS). Since date subtraction happens within _adjust_bin_anchored, some test cases have imprecise first and last values. Ah indeed; this makes sense now. Maybe we should bring this up in cftime to see what their recommendation might be? I could imagine writing a function like this that would correct for this imprecision when taking the difference between two dates: ```python from datetime import timedelta def exact_cftime_datetime_difference(a, b): """Exact computation of b - a""" seconds = b.replace(microsecond=0) - a.replace(microsecond=0) seconds = int(round(seconds.total_seconds())) microseconds = b.microsecond - a.microsecond return timedelta(seconds=seconds, microseconds=microseconds) ``` Here are a couple test cases: ```python import cftime from datetime import datetime Testing with cftime version 1.0.0 where datetime where I can reproduce precision issues. test_cases = [ [(2000, 1, 1, 0, 4, 0, 956321), (1892, 1, 3, 12, 0, 0, 112123)], [(2000, 1, 1, 0, 4, 0, 1), (1892, 1, 3, 12, 0, 0, 503432)], [(2000, 1, 1, 0, 4, 0, 999999), (1892, 1, 3, 12, 0, 0, 112123)], [(2000, 1, 1, 0, 4, 0, 11213), (1892, 1, 3, 12, 0, 0, 77777)], ] for a_args, b_args in test_cases: a_cftime = cftime.DatetimeGregorian(a_args) b_cftime = cftime.DatetimeGregorian(b_args) a = datetime(a_args) b = datetime(b_args) `expected = b - a result = exact_cftime_datetime_difference(a_cftime, b_cftime) assert result == expected inexact = b_cftime - a_cftime assert inexact != expected # Test other direction expected = a - b result = exact_cftime_datetime_difference(b_cftime, a_cftime) assert result == expected` ``` But maybe I'm missing something important.	{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	CFTimeIndex Resampling 387924616
455584540	https://github.com/pydata/xarray/pull/2593#issuecomment-455584540	https://api.github.com/repos/pydata/xarray/issues/2593	MDEyOklzc3VlQ29tbWVudDQ1NTU4NDU0MA==	jwenfai 8708062	2019-01-18T15:28:36Z	2019-01-18T15:28:36Z	CONTRIBUTOR	Your impressions are correct, sorry if my earlier comments confused you. I was just guessing that imprecise `first` and `last` values might be the cause for extra bins; I haven't actually tested if it was actually true. The `first` and `last` values are returned by `_adjust_bin_anchored` when `isinstance(offset, CFTIME_TICKS)`. Since date subtraction happens within `_adjust_bin_anchored`, some test cases have imprecise `first` and `last` values. I'll provide examples of extra bins later in the day. For now, here's an example of the datetime imprecision in `_adjust_bin_anchored`: python import xarray as xr import numpy as np from xarray.coding.cftime_offsets import normalize_date, to_offset from xarray.core.utils import safe_cast_to_index freq = '600003T' closed = 'right' label = 'left' base = 12 time_range_kwargs = dict(start='2004-01-01T12:07:01', periods=37, freq='A') cftime_index = xr.cftime_range(**time_range_kwargs) da_cftime = xr.DataArray(np.arange(100., 100. + cftime_index.size), [('time', cftime_index)]) group = da_cftime['time'] index = safe_cast_to_index(group) offset = to_offset(freq) first = index.min() last = index.max() base = base % offset.n start_day = normalize_date(first) base_td = type(offset)(n=base).as_timedelta() start_day += base_td foffset = (first - start_day) % offset.as_timedelta() loffset = (last - start_day) % offset.as_timedelta() print(first, '\n', start_day, '\n', base_td, '\n', foffset, '\n', loffset, '\n', first - start_day, '\n', last - start_day) which gave me `2004-12-31 12:07:01 2004-12-31 00:12:00 0:12:00 11:55:01.000008 232 days, 18:22:01.000008 11:55:01.000008 13149 days, 11:55:01.000008` The extra 8 microseconds shouldn't be there.	{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	CFTimeIndex Resampling 387924616
455540080	https://github.com/pydata/xarray/pull/2593#issuecomment-455540080	https://api.github.com/repos/pydata/xarray/issues/2593	MDEyOklzc3VlQ29tbWVudDQ1NTU0MDA4MA==	spencerkclark 6628425	2019-01-18T13:07:59Z	2019-01-18T13:07:59Z	MEMBER	@jwenfai to provide some more detail for my confusion here -- my impression is that adding or subtracting a `datetime.timedelta` object from a `cftime.datetime` object to produce another `cftime.datetime` object should always be microsecond-exact. See comments in cftime here. This is how a CFTimeIndex is constructed through `cftime_range`, so at least naively I would not anticipate any precision issues in constructing the bins. Taking the difference between two dates to produce a timedelta object takes a different code path in cftime, which is not microsecond-precise. This, as we've seen, can induce some small errors in interpolation (because in the process we need to determine the amount of time between each date in the time coordinate and a reference date). Of the ones I've inspected, the resampled cftime array always has 1 more bin than pandas Could you provide more details about this example? What were the resample parameters used (e.g. `freq`, `closed`, `label`, `base`)? Is the extra bin added to the beginning or end of the time range? If you convert to a DatetimeIndex, do all the other bins match exactly, or is there some error?	{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	CFTimeIndex Resampling 387924616
455390725	https://github.com/pydata/xarray/pull/2593#issuecomment-455390725	https://api.github.com/repos/pydata/xarray/issues/2593	MDEyOklzc3VlQ29tbWVudDQ1NTM5MDcyNQ==	spencerkclark 6628425	2019-01-18T01:09:59Z	2019-01-18T01:34:59Z	MEMBER	So I keep pandas' logic (the first bin has 1 day and 1 microsecond added to it) and raise a Value Error when either index[0] < datetime_bins[0] or index[lenidx - 1] > datetime_bins[lenbin - 1]. Exactly 👍 I'm testing against the dev version, 11 commits behind. Could the errors for XT that I get but you don't be due to cftime/linear algebra library issue? There may be enough error accumulated for hourly frequencies over 140 years that cftime_range generates an extra bin compared to pandas date_range (haven't checked all manually but I believe the majority of the non-"values falls before first bin" errors are due to extra bin(s)). 6AS_JUN only has 8 failed tests all due to "x and y nan location mismatch". ~~I think the "values falls before first bin" errors are all from pandas, where datetime arithmetic is exact, so they could not be due to cftime, right?~~ I'll take a look at the 6AS-JUN tests.	{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	CFTimeIndex Resampling 387924616
455393260	https://github.com/pydata/xarray/pull/2593#issuecomment-455393260	https://api.github.com/repos/pydata/xarray/issues/2593	MDEyOklzc3VlQ29tbWVudDQ1NTM5MzI2MA==	spencerkclark 6628425	2019-01-18T01:23:20Z	2019-01-18T01:23:20Z	MEMBER	Oh no, I meant that except for all the "values falls before first bin" errors, most (if not all) of the errors are due to shape mismatch between the resampled cftime and pandas arrays. Of the ones I've inspected, the resampled cftime array always has 1 more bin than pandas Oops, my bad, I should have read more carefully! I'll think about this more.	{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	CFTimeIndex Resampling 387924616
455392167	https://github.com/pydata/xarray/pull/2593#issuecomment-455392167	https://api.github.com/repos/pydata/xarray/issues/2593	MDEyOklzc3VlQ29tbWVudDQ1NTM5MjE2Nw==	jwenfai 8708062	2019-01-18T01:17:38Z	2019-01-18T01:17:38Z	CONTRIBUTOR	I think the "values falls before first bin" errors are all from pandas, where datetime arithmetic is exact, so they could not be due to cftime, right? I'll take a look at the 6AS-JUN tests. Oh no, I meant that except for all the "values falls before first bin" errors, most (if not all) of the errors are due to shape mismatch between the resampled cftime and pandas arrays. Of the ones I've inspected, the resampled cftime array always has 1 more bin than pandas, e.g.: `E (shapes (175204,), (175205,) mismatch) E x: array([100., nan, nan, ..., nan, nan, 114.]) E y: array([100., nan, nan, ..., nan, nan, 114.])` I was thinking that imprecise datetime arithmetic by cftime might cause an extra bin to be generated.	{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	CFTimeIndex Resampling 387924616
455391923	https://github.com/pydata/xarray/pull/2593#issuecomment-455391923	https://api.github.com/repos/pydata/xarray/issues/2593	MDEyOklzc3VlQ29tbWVudDQ1NTM5MTkyMw==	spencerkclark 6628425	2019-01-18T01:16:21Z	2019-01-18T01:17:10Z	MEMBER	We probably don't. I forget the reason but early on in development, resampling tests failed for some time ranges when using purely odd frequencies while others failed with purely even ones. Resampling tests for12H/24H frequencies might not be needed now that `_adjust_bin_edges' is being used. Yeah this has for sure been helpful for development -- I did not expect to encounter the "values falls before first bin" error, but clearly it seems we do need to worry about it.	{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	CFTimeIndex Resampling 387924616
455386887	https://github.com/pydata/xarray/pull/2593#issuecomment-455386887	https://api.github.com/repos/pydata/xarray/issues/2593	MDEyOklzc3VlQ29tbWVudDQ1NTM4Njg4Nw==	jwenfai 8708062	2019-01-18T00:50:29Z	2019-01-18T00:50:29Z	CONTRIBUTOR	Interesting, I'm not sure what's leading to the difference in behavior between platforms (I'm on a Mac). If you can distill the precision discrepancy to a minimal example, it might be worth reporting upstream to cftime. Though I agree, from xarray's perspective this is probably not a major concern. Could this issue be related to the linear algebra library? A minimal example would be the earlier example you gave. For both cftime 1.0.0 and 1.0.3.4, I get "mismatch 60.0%". The logic I wrote in `CFTimeGrouper.first_items` was in fact based in part on `lib.generate_bins_dt64`. Indeed I omitted these checks related to whether the time series values were all within the bin edges or not. I think it would actually be straightforward to add these in just before the call to `np.searchsorted` (in my code `datetime_bins` corresponds to `binner` in pandas and `index` corresponds to `values`). I would be inclined to do this rather than try to deviate from pandas' behavior here for now. So I keep pandas' logic (the first bin has 1 day and 1 microsecond added to it) and raise a Value Error when either `index[0] < datetime_bins[0]` or `index[lenidx - 1] > datetime_bins[lenbin - 1]`. Which version of pandas are you testing against? I'm testing against the dev version and with your branch I can't seem to reproduce any failures that are not related to the "values falls before first bin" error in the `XT` case. I have not tried any of the `6AS_JUN` cases yet. I'm testing against the dev version, 11 commits behind. Could the errors for `XT` that I get but you don't be due to cftime/linear algebra library issue? There may be enough error accumulated for hourly frequencies over 140 years that cftime_range generates an extra bin compared to pandas date_range (haven't checked all manually but I believe the majority of the non-"values falls before first bin" errors are due to extra bin(s)). `6AS_JUN` only has 8 failed tests all due to "x and y nan location mismatch". Yeah that's too long :). I agree we'll eventually need to pare down the tests. A big reason for this is the length of the resampled time series in some of the cases (e.g. hourly frequencies over a 140 year period produce very long arrays, e.g. in the `XT` case). There's also probably a lot of overlap between the existing test cases as far as the logic they exercise (e.g. do we need to test against frequencies '2H', '5H', '7H', '12H', and '8001H' or would just '7H' suffice etc.?). We probably don't. I forget the reason but early on in development, resampling tests failed for some time ranges when using purely odd frequencies while others failed with purely even ones. Resampling tests for12H/24H frequencies might not be needed now that `_adjust_bin_edges' is being used. While it's definitely good to test directly against pandas for some examples, in place of some of those tests, we might want to consider writing some targeted unit tests for the methods in `resample_cftime.py`, like `_get_time_bins`, `CFTimeGrouper.first_items`, etc. With those it would probably be easier to write some tests that use minimal data but exercise all the conditional logic. I'll look into writing targeted unit tests.	{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	CFTimeIndex Resampling 387924616
454940009	https://github.com/pydata/xarray/pull/2593#issuecomment-454940009	https://api.github.com/repos/pydata/xarray/issues/2593	MDEyOklzc3VlQ29tbWVudDQ1NDk0MDAwOQ==	jwenfai 8708062	2019-01-16T21:02:18Z	2019-01-16T21:02:18Z	CONTRIBUTOR	Hi @spencerkclark, sorry it took so long to get back to you. I've implemented your simplified resampling logic. Some of the logic had to be altered since pandas have made updates. It's great not having to delineate between upsampling/downsampling cases! I ran into some issues though and I thought maybe an extra pair of eyes could help me diagnose them: cftime : Not really important but I cannot reproduce the results you obtained for cftime 1.0.3.4. I've tried Python 2.7 and 3.6, conda packages and also building from source, Windows machine and the Windows Ubuntu shell --- datetime arithmetic precision problem persists. To work around this issue, I'm using `assert_allclose` with default tolerances on the tests as suggested. pandas : The pandas library refuses to resample certain indices and throws a "values falls before first bin" error. The error comes from `bins=lib.generate_bins_dt64(...)` around line 1400 of `pandas/core/resample.py` and is a direct consequence of the `_adjust_bin_edges` operation adding 1 extra day minus 1 nanosecond causing the first value of sorted `bin_edges` to be larger than the first sorted `ax_values`. My current workaround is to use `pytest.mark.xfail(raises=ValueError)`. CFTimeIndex resampling does not encounter the same error. Nevertheless, I've changed the CFTimeIndex resampling logic so that the first bin value does not have 1 day minus 1 microsecond added to it to (hopefully) rectify the error. Testing against pandas resampling results does not show any difference between the corrected and uncorrected CFTimeIndex resampling code. xarray : Ignoring the aforementioned issue with pandas, xarray resampling results for certain time ranges do not match pandas', specifically these two: `dict(start='1892-01-01T12:00:00', periods=15, freq='5256113T'),` labeled `XT`, and `dict(start='1892', periods=10, freq='6AS-JUN')`, labeled `6AS_JUN`. `XT` seems to be causing the most problem, which might be due to its rather challenging `freq` specification. Since I've rewritten `test_cftimeindex_resample.py` based on your gists, a lot more test cases are being generated. Without `XT` and `6AS_JUN`, the tests take about 40 minutes to run on my machine; including them bumps that time up to 3 hours. The number of tests should be pared down prior to merging but I think they're helpful right now for identifying problems. I've included test results in XML for you and other collaborators to compare against. One file contains the results with the 1 day minus 1 microsecond fix applied and the other is without the fix. They can be imported into PyCharm, but I'm not sure if they can be read any other way. Test Results - pytest_in_test_cftimeindex_resample_py.zip	{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	CFTimeIndex Resampling 387924616
450740815	https://github.com/pydata/xarray/pull/2593#issuecomment-450740815	https://api.github.com/repos/pydata/xarray/issues/2593	MDEyOklzc3VlQ29tbWVudDQ1MDc0MDgxNQ==	spencerkclark 6628425	2019-01-01T16:18:40Z	2019-01-01T16:18:40Z	MEMBER	I'm on a break right now and I'll look more closely at the alternative solution when I'm back, but from what I've read in your comment the solution makes sense. No worries! I just wanted to get these thoughts down when I had some time. No rush to make any updates.	{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	CFTimeIndex Resampling 387924616
450544589	https://github.com/pydata/xarray/pull/2593#issuecomment-450544589	https://api.github.com/repos/pydata/xarray/issues/2593	MDEyOklzc3VlQ29tbWVudDQ1MDU0NDU4OQ==	jwenfai 8708062	2018-12-30T07:45:24Z	2018-12-30T07:45:24Z	CONTRIBUTOR	I'm on a break right now and I'll look more closely at the alternative solution when I'm back, but from what I've read in your comment the solution makes sense. Also, nice job catching why `_adjust_bin_edges` was needed and writing a comprehensive explanation in the code.	{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	CFTimeIndex Resampling 387924616
450533035	https://github.com/pydata/xarray/pull/2593#issuecomment-450533035	https://api.github.com/repos/pydata/xarray/issues/2593	MDEyOklzc3VlQ29tbWVudDQ1MDUzMzAzNQ==	spencerkclark 6628425	2018-12-30T01:31:12Z	2018-12-30T01:31:12Z	MEMBER	So the crux of the problem now seems to be in generating `first_items`; this is the Series that is used for both upsampling and downsampling a DataArray in xarray. For data indexed by a DatetimeIndex, it is straightforward to generate this Series (it just takes the construction of a simple Series with `np.arange` and the reference index, the construction of a `pandas.Grouper` object, and a call to `groupby` with the method `first`). In xarray, downsampling uses both the values (to define the groups) and index (to define the labels) of this Series, while upsampling only uses the index. For data indexed by a CFTimeIndex, we do not have the luxury of a formal `Grouper` object; however, if we can create this `first_items` Series accurately, I think all other results of resample in xarray should follow. I've put together a gist which compares the `first_items` Series generated with pandas with that generated by the cftime logic (the output of running the tests is also included). I've tried to use a fairly challenging set of initial time indexes as well as resample frequencies (different than what are currently used in the tests); there appear to be many mismatches under the "upsampling" case, but also a few errors show up in the "downsampling" case (to some extent I think these are related to the omission of the `_adjust_bin_edges` method, which it turns out I do think we may need). In theory though, because of how this `first_items` Series is created in the DatetimeIndex case, I don't think the way we create it in the CFTimeIndex case should depend on whether the length of the reference index is greater than or less than the length of the resample labels (upsampling or downsampling is determined instead by the resampling method used). This inspired the alternative solution proposed in the second part of the gist (I've also added back in a call to a cftime version of the `_adjust_bin_edges` method); in this case there is no dependence on the relative lengths of the reference index and resample labels, and all of the test cases I've tried so far pass. Let me know if this alternative solution makes sense. Digging in to the guts of the resample code in pandas/xarray is still fairly new for me too, so I could be missing something. In the gist I'm using this branch of xarray, the development version of pandas, and the latest version of cftime. Thanks again for your hard work on this!	{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	CFTimeIndex Resampling 387924616
449635856	https://github.com/pydata/xarray/pull/2593#issuecomment-449635856	https://api.github.com/repos/pydata/xarray/issues/2593	MDEyOklzc3VlQ29tbWVudDQ0OTYzNTg1Ng==	spencerkclark 6628425	2018-12-23T13:16:41Z	2018-12-23T13:16:41Z	MEMBER	If pandas master with the altered resampling logic will be the definitive version going forward, should development of CFTimeIndex resampling be suspended until this version of pandas master is released and xarray uses it as a dependency? I think it is OK to continue working here, with the aim of implementing resample for cftime that is closest to the most up-to-date pandas version. We can add code to skip tests that depend on this behavior if the version of pandas is not recent enough (see an example of doing this here). As we're working before version 0.24 of pandas is released we can keep an eye on our "py36-pandas-dev" CI build listed under the "Allowed Failures" section in Travis.	{ "total_count": 1, "+1": 1, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	CFTimeIndex Resampling 387924616
449555459	https://github.com/pydata/xarray/pull/2593#issuecomment-449555459	https://api.github.com/repos/pydata/xarray/issues/2593	MDEyOklzc3VlQ29tbWVudDQ0OTU1NTQ1OQ==	jwenfai 8708062	2018-12-22T08:39:59Z	2018-12-22T08:39:59Z	CONTRIBUTOR	Glad to see that I'm not the only one getting different results. And I agree (biased as I am) that the additional bin at the end of resampled time series is superfluous. If pandas master with the altered resampling logic will be the definitive version going forward, should development of CFTimeIndex resampling be suspended until this version of pandas master is released and xarray uses it as a dependency? On a somewhat related note, looking over the latest pandas master resample.py made me realize that https://github.com/pydata/xarray/blob/4317c697900c80604dee793ffc1186e5c57a03fd/xarray/core/resample_cftime.py#L114-L118 is now wrong due to changes made 16 days ago (https://github.com/pandas-dev/pandas/issues/24127) . Since non-integer offset frequencies are not supported by BaseCFTimeOffset, (`is_day and offset.n == 1` should just be `is_day`), which can be further simplified to: `python if isinstance(offset, CFTIME_TICKS): return _adjust_dates_anchored(first, last, offset, closed=closed, base=base)`	{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	CFTimeIndex Resampling 387924616
449534277	https://github.com/pydata/xarray/pull/2593#issuecomment-449534277	https://api.github.com/repos/pydata/xarray/issues/2593	MDEyOklzc3VlQ29tbWVudDQ0OTUzNDI3Nw==	spencerkclark 6628425	2018-12-22T01:19:42Z	2018-12-22T01:19:42Z	MEMBER	So I think the way you have things written now in `_get_time_bins` makes sense, though this subtle change in behavior on the pandas side makes testing against different releases of pandas a little trickier.	{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	CFTimeIndex Resampling 387924616
449533135	https://github.com/pydata/xarray/pull/2593#issuecomment-449533135	https://api.github.com/repos/pydata/xarray/issues/2593	MDEyOklzc3VlQ29tbWVudDQ0OTUzMzEzNQ==	spencerkclark 6628425	2018-12-22T01:04:32Z	2018-12-22T01:04:32Z	MEMBER	Yes, I noticed this too. I think it's related to changes made here: https://github.com/pandas-dev/pandas/pull/24347. At least in the test cases that I've run, I've only seen it make a difference in the NaN placement at the end of the time series. For example with pandas 0.23.4: ``` In [1]: import xarray as xr; import pandas as pd In [2]: nptimes = pd.date_range('2000', periods=2000) In [3]: nptime_da = xr.DataArray(range(2000), [('time', nptimes)]) In [4]: nptime_da.resample(time='4AS').mean('time') Out[4]: <xarray.DataArray (time: 3)> array([ 730., 1730., nan]) Coordinates: * time (time) datetime64[ns] 2000-01-01 2004-01-01 2008-01-01 `and with pandas master:` In [4]: nptime_da.resample(time='4AS').mean('time') Out[4]: <xarray.DataArray (time: 2)> array([ 730., 1730.]) Coordinates: * time (time) datetime64[ns] 2000-01-01 2004-01-01 `` I feel like the result under pandas master makes more sense, given the input array, whose final date is 2005-06-22. Adding an additional bin labeled 2008-01-01 to the resampled time series seems superfluous given that with the defaultlabel='left'` it's clear no data from the original time series would fall in that bin.	{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	CFTimeIndex Resampling 387924616
449529506	https://github.com/pydata/xarray/pull/2593#issuecomment-449529506	https://api.github.com/repos/pydata/xarray/issues/2593	MDEyOklzc3VlQ29tbWVudDQ0OTUyOTUwNg==	jwenfai 8708062	2018-12-22T00:23:34Z	2018-12-22T00:23:34Z	CONTRIBUTOR	Can anyone confirm that the latest unreleased build of pandas gives resample results that are different from pandas 0.23.4? I get results that match cftime resampling for downsampling and upsampling without having to rely on `.dropna()`.	{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	CFTimeIndex Resampling 387924616
448448286	https://github.com/pydata/xarray/pull/2593#issuecomment-448448286	https://api.github.com/repos/pydata/xarray/issues/2593	MDEyOklzc3VlQ29tbWVudDQ0ODQ0ODI4Ng==	jwenfai 8708062	2018-12-19T02:36:38Z	2018-12-19T02:36:38Z	CONTRIBUTOR	@spencerkclark Sorry about that, must be a couple of commits behind. Should be resolved now.	{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	CFTimeIndex Resampling 387924616
448443826	https://github.com/pydata/xarray/pull/2593#issuecomment-448443826	https://api.github.com/repos/pydata/xarray/issues/2593	MDEyOklzc3VlQ29tbWVudDQ0ODQ0MzgyNg==	spencerkclark 6628425	2018-12-19T02:11:03Z	2018-12-19T02:11:03Z	MEMBER	@jwenfai thanks for the updates. It looks like there are some merge conflicts that are preventing our CI from running. Could you please resolve those when you get chance, so we can see those results?	{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	CFTimeIndex Resampling 387924616
447718468	https://github.com/pydata/xarray/pull/2593#issuecomment-447718468	https://api.github.com/repos/pydata/xarray/issues/2593	MDEyOklzc3VlQ29tbWVudDQ0NzcxODQ2OA==	jwenfai 8708062	2018-12-17T04:20:51Z	2018-12-17T04:20:51Z	CONTRIBUTOR	@spencerkclark Thanks for the detailed review! I'll fix up my code over the next few days. I haven't completely solved the upsampling issue yet but I think I might have some clues as to what's happening. Timedelta operations on cftime.datetime does not always return correct values. Sometimes, they are a few microseconds or one second off. The issue can be sidestepped by shifting the the bins 1 second forward for `closed=='right'` and 1 second back for `closed=='left'` in groupby.py, but this obviously introduces issues for resampling operations at the second and microsecond resolution. This workaround doesn't pass all the tests. An extra time bin is still sometimes created. You'll see what I mean when I make a new commit sometime next week.	{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	CFTimeIndex Resampling 387924616

Advanced export

JSON shape: default, array, newline-delimited, object

CREATE TABLE [issue_comments] (
   [html_url] TEXT,
   [issue_url] TEXT,
   [id] INTEGER PRIMARY KEY,
   [node_id] TEXT,
   [user] INTEGER REFERENCES [users]([id]),
   [created_at] TEXT,
   [updated_at] TEXT,
   [author_association] TEXT,
   [body] TEXT,
   [reactions] TEXT,
   [performed_via_github_app] TEXT,
   [issue] INTEGER REFERENCES [issues]([id])
);
CREATE INDEX [idx_issue_comments_issue]
    ON [issue_comments] ([issue]);
CREATE INDEX [idx_issue_comments_user]
    ON [issue_comments] ([user]);

issue_comments

39 rows where issue = 387924616 sorted by updated_at descending

Comment last updated on February 02, 2019 at 20:46 Hours UTC

Testing with cftime version 1.0.0 where datetime where I can

reproduce precision issues.

Advanced export