home / github / issue_comments

Menu
  • GraphQL API
  • Search all tables

issue_comments: 454940009

This data as json

html_url issue_url id node_id user created_at updated_at author_association body reactions performed_via_github_app issue
https://github.com/pydata/xarray/pull/2593#issuecomment-454940009 https://api.github.com/repos/pydata/xarray/issues/2593 454940009 MDEyOklzc3VlQ29tbWVudDQ1NDk0MDAwOQ== 8708062 2019-01-16T21:02:18Z 2019-01-16T21:02:18Z CONTRIBUTOR

Hi @spencerkclark, sorry it took so long to get back to you. I've implemented your simplified resampling logic. Some of the logic had to be altered since pandas have made updates.

It's great not having to delineate between upsampling/downsampling cases! I ran into some issues though and I thought maybe an extra pair of eyes could help me diagnose them:

  1. cftime : Not really important but I cannot reproduce the results you obtained for cftime 1.0.3.4. I've tried Python 2.7 and 3.6, conda packages and also building from source, Windows machine and the Windows Ubuntu shell --- datetime arithmetic precision problem persists. To work around this issue, I'm using assert_allclose with default tolerances on the tests as suggested.

  2. pandas : The pandas library refuses to resample certain indices and throws a "values falls before first bin" error. The error comes from bins=lib.generate_bins_dt64(...) around line 1400 of pandas/core/resample.py and is a direct consequence of the _adjust_bin_edges operation adding 1 extra day minus 1 nanosecond causing the first value of sorted bin_edges to be larger than the first sorted ax_values. My current workaround is to use pytest.mark.xfail(raises=ValueError).

CFTimeIndex resampling does not encounter the same error. Nevertheless, I've changed the CFTimeIndex resampling logic so that the first bin value does not have 1 day minus 1 microsecond added to it to (hopefully) rectify the error. Testing against pandas resampling results does not show any difference between the corrected and uncorrected CFTimeIndex resampling code.

  1. xarray : Ignoring the aforementioned issue with pandas, xarray resampling results for certain time ranges do not match pandas', specifically these two: dict(start='1892-01-01T12:00:00', periods=15, freq='5256113T'), labeled XT, and dict(start='1892', periods=10, freq='6AS-JUN'), labeled 6AS_JUN. XT seems to be causing the most problem, which might be due to its rather challenging freq specification.

Since I've rewritten test_cftimeindex_resample.py based on your gists, a lot more test cases are being generated. Without XT and 6AS_JUN, the tests take about 40 minutes to run on my machine; including them bumps that time up to 3 hours. The number of tests should be pared down prior to merging but I think they're helpful right now for identifying problems. I've included test results in XML for you and other collaborators to compare against. One file contains the results with the 1 day minus 1 microsecond fix applied and the other is without the fix. They can be imported into PyCharm, but I'm not sure if they can be read any other way. Test Results - pytest_in_test_cftimeindex_resample_py.zip

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  387924616
Powered by Datasette · Queries took 0.798ms · About: xarray-datasette