home / github / issue_comments

Menu
  • GraphQL API
  • Search all tables

issue_comments: 1209567966

This data as json

html_url issue_url id node_id user created_at updated_at author_association body reactions performed_via_github_app issue
https://github.com/pydata/xarray/issues/6191#issuecomment-1209567966 https://api.github.com/repos/pydata/xarray/issues/6191 1209567966 IC_kwDOAMm_X85IGIre 868027 2022-08-09T15:52:31Z 2022-08-09T15:52:31Z CONTRIBUTOR

I got caught by this one yesterday on an M1 machine. I did some digging and found what I think to be the underlying issue. The short explanation is that the time conversion functions do an astype(np.int64) or equivalent cast on arrays that contain nans. This is undefined behavior and very soon, doing this will start to emit RuntimeWarnings.

I knew from my own data files that it wasn't the first element of the array being substituted but whatever was in the units as the epoch. I started to poke at the xarray internals (and the CFtime internals) to try to get a minimal example working, eventually found the following:

On an M1: ```python

from xarray.coding.times import _decode_datetime_with_pandas import numpy as np _decode_datetime_with_pandas(np.array([20000, float('nan')]), "days since 1950-01-01", "proleptic_gregorian") array(['2004-10-04T00:00:00.000000000', '1950-01-01T00:00:00.000000000'], dtype='datetime64[ns]') np.array(np.nan).astype(np.int64) array(0) ```

On an x86_64: ```python

from xarray.coding.times import _decode_datetime_with_pandas import numpy as np _decode_datetime_with_pandas(np.array([20000, float('nan')]), "days since 1950-01-01", "proleptic_gregorian") array(['2004-10-04T00:00:00.000000000', 'NaT'], dtype='datetime64[ns]') np.array(np.nan).astype(np.int64) array(-9223372036854775808) ```

This issue is not Apple/M1/clang specific, I tested on an aws graviton (arm) instance and got the same results with ubuntu/gcc: ```python Python 3.10.4 (main, Jun 29 2022, 12:14:53) [GCC 11.2.0] on linux Type "help", "copyright", "credits" or "license" for more information.

from xarray.coding.times import _decode_datetime_with_pandas import numpy as np _decode_datetime_with_pandas(np.array([20000, float('nan')]), "days since 1950-01-01", "proleptic_gregorian") array(['2004-10-04T00:00:00.000000000', '1950-01-01T00:00:00.000000000'], dtype='datetime64[ns]') np.array(np.nan).astype(np.int64) array(0) ```

Here is where the cast is happening on the internal xarray implementation, CFtime has similar casts in its implementation. https://github.com/pydata/xarray/blob/8417f495e6b81a60833f86a978e5a8080a619aa0/xarray/coding/times.py#L237-L239

{
    "total_count": 4,
    "+1": 2,
    "-1": 0,
    "laugh": 0,
    "hooray": 2,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  1114351614
Powered by Datasette · Queries took 0.608ms · About: xarray-datasette