html_url,issue_url,id,node_id,user,created_at,updated_at,author_association,body,reactions,performed_via_github_app,issue https://github.com/pydata/xarray/issues/6191#issuecomment-1209664972,https://api.github.com/repos/pydata/xarray/issues/6191,1209664972,IC_kwDOAMm_X85IGgXM,868027,2022-08-09T17:30:07Z,2022-08-09T17:30:07Z,CONTRIBUTOR,"Some additional info for when how to figure out the best way to address this. For the decode using pandas approach, two things I tried worked: using a pandas.array with a nullable integer data type, or simulating what happens on x86_64 systems by checking for nans in the incoming array and setting those positions to `numpy.iinfo(np.int64).min`. the pandas nullable integer array: ```python # note that is a capital i Int64 to use the nullable type. flat_num_dates_ns_int = pd.array(flat_num_dates * _NS_PER_TIME_DELTA[delta], dtype=""Int64"") ``` simulate x86: ```python flat_num_dates_ns_int = (flat_num_dates * _NS_PER_TIME_DELTA[delta]).astype( np.int64 ) flat_num_dates_ns_int[np.isnan(flat_num_dates)] = np.iinfo(np.int64).min ``` The pandas solution is explicitly experimental in their docs, and the emulate version just feels ""hacky"" to me. These don't break any existing tests on my local machine. cftime itself has no support for nan type missing values and will fail: (on x86_64) ```python >>> import numpy as np >>> from xarray.coding.times import decode_cf_datetime >>> decode_cf_datetime(np.array([0, np.nan]), ""days since 1950-01-01"", use_cftime=True) Traceback (most recent call last): File """", line 1, in File ""/home/abarna/.pyenv/versions/3.8.5/lib/python3.8/site-packages/xarray/coding/times.py"", line 248, in decode_cf_datetime dates = _decode_datetime_with_cftime(flat_num_dates, units, calendar) File ""/home/abarna/.pyenv/versions/3.8.5/lib/python3.8/site-packages/xarray/coding/times.py"", line 164, in _decode_datetime_with_cftime cftime.num2date(num_dates, units, calendar, only_use_cftime_datetimes=True) File ""src/cftime/_cftime.pyx"", line 484, in cftime._cftime.num2date TypeError: unsupported operand type(s) for +: 'cftime._cftime.DatetimeGregorian' and 'NoneType' ``` cftime is happy with masked arrays: ```python >>> import cftime >>> a1 = np.ma.masked_invalid(np.array([0, np.nan])) >>> cftime.num2date(a1, ""days since 1950-01-01"") masked_array(data=[cftime.DatetimeGregorian(1950, 1, 1, 0, 0, 0, 0), --], mask=[False, True], fill_value='?', dtype=object) ```","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,1114351614 https://github.com/pydata/xarray/issues/6191#issuecomment-1209567966,https://api.github.com/repos/pydata/xarray/issues/6191,1209567966,IC_kwDOAMm_X85IGIre,868027,2022-08-09T15:52:31Z,2022-08-09T15:52:31Z,CONTRIBUTOR,"I got caught by this one yesterday on an M1 machine. I did some digging and found what I think to be the underlying issue. The short explanation is that the time conversion functions do an `astype(np.int64)` or equivalent cast on arrays that contain nans. This is [undefined behavior](https://github.com/numpy/numpy/issues/13101#issuecomment-740058842) and very soon, doing this will[ start to emit RuntimeWarnings](https://github.com/numpy/numpy/pull/21437). I knew from my own data files that it wasn't the first element of the array being substituted but whatever was in the units as the epoch. I started to poke at the xarray internals (and the CFtime internals) to try to get a minimal example working, eventually found the following: On an M1: ```python >>> from xarray.coding.times import _decode_datetime_with_pandas >>> import numpy as np >>> _decode_datetime_with_pandas(np.array([20000, float('nan')]), ""days since 1950-01-01"", ""proleptic_gregorian"") array(['2004-10-04T00:00:00.000000000', '1950-01-01T00:00:00.000000000'], dtype='datetime64[ns]') >>> np.array(np.nan).astype(np.int64) array(0) ``` On an x86_64: ```python >>> from xarray.coding.times import _decode_datetime_with_pandas >>> import numpy as np >>> _decode_datetime_with_pandas(np.array([20000, float('nan')]), ""days since 1950-01-01"", ""proleptic_gregorian"") array(['2004-10-04T00:00:00.000000000', 'NaT'], dtype='datetime64[ns]') >>> np.array(np.nan).astype(np.int64) array(-9223372036854775808) ``` This issue is not Apple/M1/clang specific, I tested on an aws graviton (arm) instance and got the same results with ubuntu/gcc: ```python Python 3.10.4 (main, Jun 29 2022, 12:14:53) [GCC 11.2.0] on linux Type ""help"", ""copyright"", ""credits"" or ""license"" for more information. >>> from xarray.coding.times import _decode_datetime_with_pandas >>> import numpy as np >>> _decode_datetime_with_pandas(np.array([20000, float('nan')]), ""days since 1950-01-01"", ""proleptic_gregorian"") array(['2004-10-04T00:00:00.000000000', '1950-01-01T00:00:00.000000000'], dtype='datetime64[ns]') >>> np.array(np.nan).astype(np.int64) array(0) ``` Here is where the cast is happening on the internal xarray implementation, CFtime has similar casts in its implementation. https://github.com/pydata/xarray/blob/8417f495e6b81a60833f86a978e5a8080a619aa0/xarray/coding/times.py#L237-L239","{""total_count"": 4, ""+1"": 2, ""-1"": 0, ""laugh"": 0, ""hooray"": 2, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,1114351614 https://github.com/pydata/xarray/issues/6191#issuecomment-1134796696,https://api.github.com/repos/pydata/xarray/issues/6191,1134796696,IC_kwDOAMm_X85Do5-Y,387624,2022-05-23T15:08:27Z,2022-05-23T15:09:48Z,NONE,"It is replaced by the first value of the array. If you change to: `time = pd.date_range(start=""2022-01-02"",end=""2022-01-11"").to_pydatetime()` the `NaT` is replaced by `'2022-01-02T00:00:00.000000000'`. Maybe it is stored internally as a `time_origin` and some `time_delta`, and the `NaT` are replaced by 0?","{""total_count"": 1, ""+1"": 1, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,1114351614 https://github.com/pydata/xarray/issues/6191#issuecomment-1133765720,https://api.github.com/repos/pydata/xarray/issues/6191,1133765720,IC_kwDOAMm_X85Dk-RY,5635139,2022-05-21T20:43:08Z,2022-05-21T20:43:08Z,MEMBER,"I sorted out my M1 python installation and can reproduce: ``` In [21]: ds_r.time Out[21]: array(['2022-01-01T00:00:00.000000000', '2022-01-02T00:00:00.000000000', '2022-01-03T00:00:00.000000000', '2022-01-04T00:00:00.000000000', '2022-01-01T00:00:00.000000000', '2022-01-06T00:00:00.000000000', # Note the first value on this line! '2022-01-07T00:00:00.000000000', '2022-01-08T00:00:00.000000000', '2022-01-09T00:00:00.000000000', '2022-01-10T00:00:00.000000000'], dtype='datetime64[ns]') Dimensions without coordinates: nt ``` It's quite surprising we get `'2022-01-01T00:00:00.000000000'` rather than `NaT` — why the beginning of the year?! I suspect it's not directly an xarray issue given Xarray is only python code, and python code does not directly branch by CPU. I've frequently had issues like this where it's difficult to understand which library is responsible, I'd welcome any more investigation here. ","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,1114351614 https://github.com/pydata/xarray/issues/6191#issuecomment-1022224801,https://api.github.com/repos/pydata/xarray/issues/6191,1022224801,IC_kwDOAMm_X8487emh,387624,2022-01-26T14:01:43Z,2022-01-26T14:02:43Z,NONE,"I'm actually using [miniforge](https://github.com/conda-forge/miniforge) which natively supports ARM64. Uninstalling netCDF4 does not fix the issue. And actually, opening the same file as follow: ``` from netCDF4 import Dataset f = Dataset('test.nc') f['time'][:] ``` gives the expected results (dates are not recognized but the `nan` is there): ``` masked_array(data=[0.0, 1.0, 2.0, 3.0, --, 5.0, 6.0, 7.0, 8.0, 9.0], mask=[False, False, False, False, True, False, False, False, False, False], fill_value=nan) ```","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,1114351614 https://github.com/pydata/xarray/issues/6191#issuecomment-1021889497,https://api.github.com/repos/pydata/xarray/issues/6191,1021889497,IC_kwDOAMm_X8486MvZ,5635139,2022-01-26T05:53:43Z,2022-01-26T05:53:43Z,MEMBER,"I tried reproducing on an M1 Mac, but my install of python seems to report that it's on an x86_64 (`version='Darwin Kernel Version 21.0.1: Tue Sep 14 20:56:24 PDT 2021 ; root:xnu-8019.30.61~4/RELEASE_ARM64_T6000', machine='x86_64'`). It didn't reproduce, unsurprisingly. Does uninstalling `netCDF4` help? That would isolate it to that library and its dependencies.","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,1114351614 https://github.com/pydata/xarray/issues/6191#issuecomment-1021611725,https://api.github.com/repos/pydata/xarray/issues/6191,1021611725,IC_kwDOAMm_X8485I7N,387624,2022-01-25T21:10:28Z,2022-01-25T21:10:28Z,NONE,So far I haven't spotted any other issues with libnetcdf.,"{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,1114351614 https://github.com/pydata/xarray/issues/6191#issuecomment-1021609794,https://api.github.com/repos/pydata/xarray/issues/6191,1021609794,IC_kwDOAMm_X8485IdC,5635139,2022-01-25T21:08:02Z,2022-01-25T21:08:02Z,MEMBER,"Thanks @philippemiron . My guess is that this is an issue with an underlying library, since xarray doesn't generally do these operations in its code. Do you know if there are any similar issues in libnetcdf or netCDF4? (Others know more than me about these libraries, so please feel free to interject)","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,1114351614