home / github

Menu
  • Search all tables
  • GraphQL API

issue_comments

Table actions
  • GraphQL API for issue_comments

2 rows where author_association = "CONTRIBUTOR", issue = 1114351614 and user = 868027 sorted by updated_at descending

✎ View and edit SQL

This data as json, CSV (advanced)

Suggested facets: created_at (date), updated_at (date)

user 1

  • DocOtak · 2 ✖

issue 1

  • [Bug]: reading NaT/NaN on M1 ARM chip · 2 ✖

author_association 1

  • CONTRIBUTOR · 2 ✖
id html_url issue_url node_id user created_at updated_at ▲ author_association body reactions performed_via_github_app issue
1209664972 https://github.com/pydata/xarray/issues/6191#issuecomment-1209664972 https://api.github.com/repos/pydata/xarray/issues/6191 IC_kwDOAMm_X85IGgXM DocOtak 868027 2022-08-09T17:30:07Z 2022-08-09T17:30:07Z CONTRIBUTOR

Some additional info for when how to figure out the best way to address this.

For the decode using pandas approach, two things I tried worked: using a pandas.array with a nullable integer data type, or simulating what happens on x86_64 systems by checking for nans in the incoming array and setting those positions to numpy.iinfo(np.int64).min.

the pandas nullable integer array: ```python

# note that is a capital i Int64 to use the nullable type.
flat_num_dates_ns_int = pd.array(flat_num_dates * _NS_PER_TIME_DELTA[delta], dtype="Int64")

simulate x86:python

flat_num_dates_ns_int = (flat_num_dates * _NS_PER_TIME_DELTA[delta]).astype(
    np.int64
)

flat_num_dates_ns_int[np.isnan(flat_num_dates)] = np.iinfo(np.int64).min

```

The pandas solution is explicitly experimental in their docs, and the emulate version just feels "hacky" to me. These don't break any existing tests on my local machine.

cftime itself has no support for nan type missing values and will fail:

(on x86_64) ```python

import numpy as np from xarray.coding.times import decode_cf_datetime decode_cf_datetime(np.array([0, np.nan]), "days since 1950-01-01", use_cftime=True) Traceback (most recent call last): File "<stdin>", line 1, in <module> File "/home/abarna/.pyenv/versions/3.8.5/lib/python3.8/site-packages/xarray/coding/times.py", line 248, in decode_cf_datetime dates = _decode_datetime_with_cftime(flat_num_dates, units, calendar) File "/home/abarna/.pyenv/versions/3.8.5/lib/python3.8/site-packages/xarray/coding/times.py", line 164, in _decode_datetime_with_cftime cftime.num2date(num_dates, units, calendar, only_use_cftime_datetimes=True) File "src/cftime/_cftime.pyx", line 484, in cftime._cftime.num2date TypeError: unsupported operand type(s) for +: 'cftime._cftime.DatetimeGregorian' and 'NoneType' ```

cftime is happy with masked arrays: ```python

import cftime a1 = np.ma.masked_invalid(np.array([0, np.nan])) cftime.num2date(a1, "days since 1950-01-01") masked_array(data=[cftime.DatetimeGregorian(1950, 1, 1, 0, 0, 0, 0), --], mask=[False, True], fill_value='?', dtype=object) ```

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  [Bug]: reading NaT/NaN on M1 ARM chip 1114351614
1209567966 https://github.com/pydata/xarray/issues/6191#issuecomment-1209567966 https://api.github.com/repos/pydata/xarray/issues/6191 IC_kwDOAMm_X85IGIre DocOtak 868027 2022-08-09T15:52:31Z 2022-08-09T15:52:31Z CONTRIBUTOR

I got caught by this one yesterday on an M1 machine. I did some digging and found what I think to be the underlying issue. The short explanation is that the time conversion functions do an astype(np.int64) or equivalent cast on arrays that contain nans. This is undefined behavior and very soon, doing this will start to emit RuntimeWarnings.

I knew from my own data files that it wasn't the first element of the array being substituted but whatever was in the units as the epoch. I started to poke at the xarray internals (and the CFtime internals) to try to get a minimal example working, eventually found the following:

On an M1: ```python

from xarray.coding.times import _decode_datetime_with_pandas import numpy as np _decode_datetime_with_pandas(np.array([20000, float('nan')]), "days since 1950-01-01", "proleptic_gregorian") array(['2004-10-04T00:00:00.000000000', '1950-01-01T00:00:00.000000000'], dtype='datetime64[ns]') np.array(np.nan).astype(np.int64) array(0) ```

On an x86_64: ```python

from xarray.coding.times import _decode_datetime_with_pandas import numpy as np _decode_datetime_with_pandas(np.array([20000, float('nan')]), "days since 1950-01-01", "proleptic_gregorian") array(['2004-10-04T00:00:00.000000000', 'NaT'], dtype='datetime64[ns]') np.array(np.nan).astype(np.int64) array(-9223372036854775808) ```

This issue is not Apple/M1/clang specific, I tested on an aws graviton (arm) instance and got the same results with ubuntu/gcc: ```python Python 3.10.4 (main, Jun 29 2022, 12:14:53) [GCC 11.2.0] on linux Type "help", "copyright", "credits" or "license" for more information.

from xarray.coding.times import _decode_datetime_with_pandas import numpy as np _decode_datetime_with_pandas(np.array([20000, float('nan')]), "days since 1950-01-01", "proleptic_gregorian") array(['2004-10-04T00:00:00.000000000', '1950-01-01T00:00:00.000000000'], dtype='datetime64[ns]') np.array(np.nan).astype(np.int64) array(0) ```

Here is where the cast is happening on the internal xarray implementation, CFtime has similar casts in its implementation. https://github.com/pydata/xarray/blob/8417f495e6b81a60833f86a978e5a8080a619aa0/xarray/coding/times.py#L237-L239

{
    "total_count": 4,
    "+1": 2,
    "-1": 0,
    "laugh": 0,
    "hooray": 2,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  [Bug]: reading NaT/NaN on M1 ARM chip 1114351614

Advanced export

JSON shape: default, array, newline-delimited, object

CSV options:

CREATE TABLE [issue_comments] (
   [html_url] TEXT,
   [issue_url] TEXT,
   [id] INTEGER PRIMARY KEY,
   [node_id] TEXT,
   [user] INTEGER REFERENCES [users]([id]),
   [created_at] TEXT,
   [updated_at] TEXT,
   [author_association] TEXT,
   [body] TEXT,
   [reactions] TEXT,
   [performed_via_github_app] TEXT,
   [issue] INTEGER REFERENCES [issues]([id])
);
CREATE INDEX [idx_issue_comments_issue]
    ON [issue_comments] ([issue]);
CREATE INDEX [idx_issue_comments_user]
    ON [issue_comments] ([user]);
Powered by Datasette · Queries took 40.78ms · About: xarray-datasette