home / github

Menu
  • Search all tables
  • GraphQL API

issue_comments

Table actions
  • GraphQL API for issue_comments

4 rows where issue = 268725471 sorted by updated_at descending

✖
✖

✎ View and edit SQL

This data as json, CSV (advanced)

Suggested facets: created_at (date), updated_at (date)

user 3

  • gmaze 2
  • rabernat 1
  • shoyer 1

author_association 2

  • CONTRIBUTOR 2
  • MEMBER 2

issue 1

  • Decoding time according to CF conventions raises error if a NaN is found · 4 ✖
id html_url issue_url node_id user created_at updated_at ▲ author_association body reactions performed_via_github_app issue
340008407 https://github.com/pydata/xarray/issues/1662#issuecomment-340008407 https://api.github.com/repos/pydata/xarray/issues/1662 MDEyOklzc3VlQ29tbWVudDM0MDAwODQwNw== gmaze 1956032 2017-10-27T15:44:11Z 2017-10-27T15:44:11Z CONTRIBUTOR

Note that if the xarray decode_cf is given a NaT, in a datetime64, it works:

```python attrs = {'units': 'days since 1950-01-01 00:00:00 UTC'} # Classic Argo data Julian Day reference jd = [24658.46875, 24658.46366898, 24658.47256944, np.NaN] # Sample

def dirtyfixNaNjd(ref,day): td = pd.NaT if not np.isnan(day): td = pd.Timedelta(days=day) return pd.Timestamp(ref) + td

jd = [dirtyfixNaNjd('1950-01-01',day) for day in jd] print jd python [Timestamp('2017-07-06 11:15:00'), Timestamp('2017-07-06 11:07:40.999872'), Timestamp('2017-07-06 11:20:29.999616'), NaT] then:python ds = xr.Dataset({'time': ('time', jd, {'units': 'ns'})}) # Update the units attribute appropriately ds = xr.decode_cf(ds) print ds['time'].values python ['2017-07-06T11:15:00.000000000' '2017-07-06T11:07:40.999872000' '2017-07-06T11:20:29.999616000' 'NaT'] ```

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Decoding time according to CF conventions raises error if a NaN is found 268725471
339991529 https://github.com/pydata/xarray/issues/1662#issuecomment-339991529 https://api.github.com/repos/pydata/xarray/issues/1662 MDEyOklzc3VlQ29tbWVudDMzOTk5MTUyOQ== gmaze 1956032 2017-10-27T14:42:56Z 2017-10-27T14:42:56Z CONTRIBUTOR

Hi Ryan, never been very far, following/promoting xarray around here, and congrats for Pangeo !

Ok, I get the datatype being wrong, but about the issue from pandas TimedeltaIndex: Does this means that a quick/dirty fix should be to decode value by value rather than on a vector ?

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Decoding time according to CF conventions raises error if a NaN is found 268725471
339730597 https://github.com/pydata/xarray/issues/1662#issuecomment-339730597 https://api.github.com/repos/pydata/xarray/issues/1662 MDEyOklzc3VlQ29tbWVudDMzOTczMDU5Nw== shoyer 1217238 2017-10-26T16:56:03Z 2017-10-26T16:56:03Z MEMBER

I'm pretty sure this used to work in some form. I definitely worked with a dataset in the infancy of xarray that had coordinates with missing times.

The current issue appears to be that pandas represents the NaT values as an integer, and then (predictably) suffers from numeric overflow: ``` In [8]: import pandas as pd

In [9]: pd.to_timedelta(['24658 days 11:15:00', 'NaT']) + pd.Timestamp('1950-01-01')

OverflowError Traceback (most recent call last) <ipython-input-9-cc287bf4c401> in <module>() ----> 1 pd.to_timedelta(['24658 days 11:15:00', 'NaT']) + pd.Timestamp('1950-01-01')

~/conda/envs/xarray-py36/lib/python3.6/site-packages/pandas/core/indexes/datetimelike.py in add(self, other) 658 return self.shift(other) 659 elif isinstance(other, (Timestamp, datetime)): --> 660 return self._add_datelike(other) 661 else: # pragma: no cover 662 return NotImplemented

~/conda/envs/xarray-py36/lib/python3.6/site-packages/pandas/core/indexes/timedeltas.py in _add_datelike(self, other) 354 other = Timestamp(other) 355 i8 = self.asi8 --> 356 result = checked_add_with_arr(i8, other.value) 357 result = self._maybe_mask_results(result, fill_value=iNaT) 358 return DatetimeIndex(result, name=self.name, copy=False)

~/conda/envs/xarray-py36/lib/python3.6/site-packages/pandas/core/algorithms.py in checked_add_with_arr(arr, b, arr_mask, b_mask) 889 890 if to_raise: --> 891 raise OverflowError("Overflow in int64 addition") 892 return arr + b 893

OverflowError: Overflow in int64 addition ```

This appears to be specific to our use of a TimedeltaIndex. Overflow doesn't appear if you add either value as scalars: ``` In [11]: pd.NaT + pd.Timestamp('1950-01-01') Out[11]: NaT

In [12]: pd.Timedelta('24658 days 11:15:00') + pd.Timestamp('1950-01-01') Out[12]: Timestamp('2017-07-06 11:15:00') ```

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Decoding time according to CF conventions raises error if a NaN is found 268725471
339646702 https://github.com/pydata/xarray/issues/1662#issuecomment-339646702 https://api.github.com/repos/pydata/xarray/issues/1662 MDEyOklzc3VlQ29tbWVudDMzOTY0NjcwMg== rabernat 1197350 2017-10-26T12:16:03Z 2017-10-26T12:16:03Z MEMBER

Hi Guillaume! Nice to see so many old friends showing up on the xarray repo...

The issue you raise is totally reasonable from a user perspective: missing values in datetime data should be permitted. But there are some upstream issues that make it challenging to solve (like most of our headaches related to datetime data).

In numpy (and computer arithmetic in general), NaN only exists in floating point datatypes. It is impossible to have a numpy datetime array with NaN in it: ```python

a = np.array(['2010-01-01', '2010-01-02'], dtype='datetime64[ns]') a[0] = np.nan ValueError: Could not convert object to NumPy datetime `` The same error would be raised ifawere an integer array; to get around that, xarray automatically casts integers with missing data to floats. But that approach obviously doesn't work withdatetime` dtypes.

Further downstream, xarray relies on netcdf4-python's num2date function to decode the date. The error is raised by that package.

This is my understanding of the problem. Some other folks here like @jhamman and @spencerkclark might have ideas about how to solve it. They are working on a new package called netcdftime which will isolate and hopefully enhance such time encoding / decoding functions.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Decoding time according to CF conventions raises error if a NaN is found 268725471

Advanced export

JSON shape: default, array, newline-delimited, object

CSV options:

CREATE TABLE [issue_comments] (
   [html_url] TEXT,
   [issue_url] TEXT,
   [id] INTEGER PRIMARY KEY,
   [node_id] TEXT,
   [user] INTEGER REFERENCES [users]([id]),
   [created_at] TEXT,
   [updated_at] TEXT,
   [author_association] TEXT,
   [body] TEXT,
   [reactions] TEXT,
   [performed_via_github_app] TEXT,
   [issue] INTEGER REFERENCES [issues]([id])
);
CREATE INDEX [idx_issue_comments_issue]
    ON [issue_comments] ([issue]);
CREATE INDEX [idx_issue_comments_user]
    ON [issue_comments] ([user]);
Powered by Datasette · Queries took 19.063ms · About: xarray-datasette
  • Sort ascending
  • Sort descending
  • Facet by this
  • Hide this column
  • Show all columns
  • Show not-blank rows