home / github

Menu
  • Search all tables
  • GraphQL API

issue_comments

Table actions
  • GraphQL API for issue_comments

11 rows where issue = 701062999 sorted by updated_at descending

✎ View and edit SQL

This data as json, CSV (advanced)

Suggested facets: created_at (date), updated_at (date)

user 4

  • spencerkclark 5
  • dopplershift 2
  • dcherian 2
  • albertotb 2

author_association 3

  • MEMBER 7
  • CONTRIBUTOR 2
  • NONE 2

issue 1

  • Problem decoding times in data from OpenDAP server · 11 ✖
id html_url issue_url node_id user created_at updated_at ▲ author_association body reactions performed_via_github_app issue
1011120549 https://github.com/pydata/xarray/issues/4422#issuecomment-1011120549 https://api.github.com/repos/pydata/xarray/issues/4422 IC_kwDOAMm_X848RHml albertotb 6514690 2022-01-12T14:47:46Z 2022-01-12T14:48:49Z NONE

I just want to add here for reference that this issue was posted earlier but closed at the time as stale. I will leave this here just to link them both and to note that this is fixed: https://github.com/pydata/xarray/issues/827

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Problem decoding times in data from OpenDAP server 701062999
696954231 https://github.com/pydata/xarray/issues/4422#issuecomment-696954231 https://api.github.com/repos/pydata/xarray/issues/4422 MDEyOklzc3VlQ29tbWVudDY5Njk1NDIzMQ== dopplershift 221526 2020-09-22T20:13:50Z 2020-09-22T20:13:50Z CONTRIBUTOR

I'd say in the case of use_ctime=True that it's a bug that it ever uses pandas for date parsing.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Problem decoding times in data from OpenDAP server 701062999
696787447 https://github.com/pydata/xarray/issues/4422#issuecomment-696787447 https://api.github.com/repos/pydata/xarray/issues/4422 MDEyOklzc3VlQ29tbWVudDY5Njc4NzQ0Nw== dcherian 2448579 2020-09-22T15:14:25Z 2020-09-22T15:14:25Z MEMBER

shouldn't raise an error for 1-1-1 since that's valid according to the Climate and Forecasting netCDF conventions

OK good point. Thanks @dopplershift

One solution would be to extract this bit of the units string and 0-pad as necessary before passing to pandas. We would have to be careful to keep the unmodified units attribute in encoding.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Problem decoding times in data from OpenDAP server 701062999
696678072 https://github.com/pydata/xarray/issues/4422#issuecomment-696678072 https://api.github.com/repos/pydata/xarray/issues/4422 MDEyOklzc3VlQ29tbWVudDY5NjY3ODA3Mg== spencerkclark 6628425 2020-09-22T12:05:58Z 2020-09-22T12:05:58Z MEMBER

It's a little more delicate, but I also think we should be able to fix this in the case of use_cftime=None (the default), so that these dates can still be represented with the np.datetime64 dtype and round-trip in a sensible way.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Problem decoding times in data from OpenDAP server 701062999
696674560 https://github.com/pydata/xarray/issues/4422#issuecomment-696674560 https://api.github.com/repos/pydata/xarray/issues/4422 MDEyOklzc3VlQ29tbWVudDY5NjY3NDU2MA== spencerkclark 6628425 2020-09-22T11:58:31Z 2020-09-22T11:58:31Z MEMBER

@dopplershift I agree an error doesn't really make sense, particularly since these units appear in the CF conventions -- thanks for pointing that out.

I hesitate to say things are working perfectly with use_cftime=True. While the decoded time objects seem to be round-tripped exactly -- which is great -- the encoded values are not, since the date in the units encoding changes from "1-1-1 00:00:0.0" to "2001-01-01". We should be able to do better there.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Problem decoding times in data from OpenDAP server 701062999
696508013 https://github.com/pydata/xarray/issues/4422#issuecomment-696508013 https://api.github.com/repos/pydata/xarray/issues/4422 MDEyOklzc3VlQ29tbWVudDY5NjUwODAxMw== dopplershift 221526 2020-09-22T04:56:17Z 2020-09-22T04:56:17Z CONTRIBUTOR

Probably shouldn't raise an error for 1-1-1 since that's valid according to the Climate and Forecasting netCDF conventions (see examples 4.5 and 4.6). In fact, it works perfectly when using use_cftime=True both for the original data and reading in the data from disk.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Problem decoding times in data from OpenDAP server 701062999
696451333 https://github.com/pydata/xarray/issues/4422#issuecomment-696451333 https://api.github.com/repos/pydata/xarray/issues/4422 MDEyOklzc3VlQ29tbWVudDY5NjQ1MTMzMw== spencerkclark 6628425 2020-09-22T00:29:32Z 2020-09-22T00:32:04Z MEMBER

Also, would this be solved if using use_cftime=True when reading back the file?

To be honest I'm not sure. You could give it a try -- the only thing there is that you'd need to cast the times back to np.datetime64 if that was ultimately the date type you wanted to go with (instead of a cftime date type).

If I understood correctly the code you quote, in that case you are overwritting the units attribute. Maybe the same could be done in this case.

In the test case I think we're merely acknowledging that the remaining part of the test would fail with these units. I agree with you and @dcherian though -- it's super weird that pandas behaves the way it does here. Considering this isn't the first time these units have come up, it might be worth special-casing them in some way.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Problem decoding times in data from OpenDAP server 701062999
696324438 https://github.com/pydata/xarray/issues/4422#issuecomment-696324438 https://api.github.com/repos/pydata/xarray/issues/4422 MDEyOklzc3VlQ29tbWVudDY5NjMyNDQzOA== dcherian 2448579 2020-09-21T19:30:13Z 2020-09-21T19:30:13Z MEMBER

This is an unusual format -- ordinarily we'd expect zero-padded year, month, and day values.

Can we raise an error here? Interpreting 1-1-1 as 2001-01-01 is really weird.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Problem decoding times in data from OpenDAP server 701062999
695978867 https://github.com/pydata/xarray/issues/4422#issuecomment-695978867 https://api.github.com/repos/pydata/xarray/issues/4422 MDEyOklzc3VlQ29tbWVudDY5NTk3ODg2Nw== albertotb 6514690 2020-09-21T08:34:23Z 2020-09-21T08:38:07Z NONE

Thank you very much for the detailed explanation and taking the time to look into this. I had a feeling it had to do with time decoding, but did not know exactly what was going on. IMHO the confusing thing is that the file parses without problems, so maybe a warning indicating that the parsing failed with pandas could help.

Also, would this be solved if using use_cftime=True when reading back the file?

If I understood correctly the code you quote, in that case you are overwritting the units attribute. Maybe the same could be donde in this case

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Problem decoding times in data from OpenDAP server 701062999
695787482 https://github.com/pydata/xarray/issues/4422#issuecomment-695787482 https://api.github.com/repos/pydata/xarray/issues/4422 MDEyOklzc3VlQ29tbWVudDY5NTc4NzQ4Mg== spencerkclark 6628425 2020-09-20T13:29:48Z 2020-09-20T13:29:48Z MEMBER

I don't know if there is anything we can do to make this less opaque. My gut feeling is to label this as a "metadata issue," and recommend addressing it at the file level, but it is awkward that it sort of works as is, but not quite.

This is not the first time an issue has been raised related to units like this: https://github.com/pydata/xarray/blob/13c09dc28ec8ff791c6d87e2d8e80c362c65ffd4/xarray/tests/test_coding_times.py#L112-L116

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Problem decoding times in data from OpenDAP server 701062999
695786924 https://github.com/pydata/xarray/issues/4422#issuecomment-695786924 https://api.github.com/repos/pydata/xarray/issues/4422 MDEyOklzc3VlQ29tbWVudDY5NTc4NjkyNA== spencerkclark 6628425 2020-09-20T13:23:58Z 2020-09-20T13:23:58Z MEMBER

Description of the problem

I believe the issue here stems from the units attribute in the original dataset: ``` In [1]: import xarray as xr

In [2]: url = "https://nomads.ncep.noaa.gov/dods/gfs_0p25_1hr/gfs20200920/gfs_0p25_1hr_00z"

In [3]: ds = xr.open_dataset(url, decode_times=False)

In [4]: ds.time.attrs["units"] Out[4]: 'days since 1-1-1 00:00:0.0' ```

This is an unusual format -- ordinarily we'd expect zero-padded year, month, and day values. Pandas misinterprets this and parses the reference date to 2001-01-01:

``` In [5]: import pandas as pd

In [6]: pd.Timestamp("1-1-1 00:00:0.0") Out[6]: Timestamp('2001-01-01 00:00:00') ```

Of course, with time values on the order 700000 and units of days, this results in dates outside the nanosecond-precision range of the np.datetime64 dtype and throws an error; xarray catches this error and then uses cftime to decode the dates. cftime parses the reference date properly, so in the end the dates are decoded correctly (good!).

There's a catch though. When saving the dates back out to a file, the odd units remain in the encoding of the time variable. When parsing the reference date, xarray again first tries using pandas. This time, there's nothing that stops xarray from proceeding, because we are no longer bound by integer overflow (taking the difference between a date in 2020 and a date in 2001 is perfectly valid for nanosecond-precision dates). So encoding succeeds, and we no longer need to try with cftime.

``` In [7]: ds = xr.decode_cf(ds)

In [8]: subset = ds["ugrd10m"].isel(time=slice(0, 8))

In [9]: subset.to_netcdf("test.nc")

In [10]: recovered = xr.open_dataset("test.nc", decode_times=False)

In [11]: recovered.time.attrs["units"] Out[11]: 'days since 2001-01-01' ```

Thus when we read the file the first time, decoding happens with cftime and when we read the file the second time, decoding happens with pandas (encoding was also different for the two files). This is the reason for the difference in values.

Workaround

To get an accurate round-trip I would recommend overwriting the units attribute to something that pandas parses correctly:

``` In [12]: ds = xr.open_dataset(url, decode_times=False)

In [13]: ds.time.attrs["units"] = "days since 0001-01-01"

In [14]: ds = xr.decode_cf(ds)

In [15]: subset = ds["ugrd10m"].isel(time=slice(0, 8))

In [16]: subset.to_netcdf("test.nc")

In [17]: recovered = xr.open_dataset("test.nc")

In [18]: recovered.time Out[18]: <xarray.DataArray 'time' (time: 8)> array(['2020-09-20T00:00:00.000000000', '2020-09-20T00:59:59.999997000', '2020-09-20T02:00:00.000003000', '2020-09-20T03:00:00.000000000', '2020-09-20T03:59:59.999997000', '2020-09-20T05:00:00.000003000', '2020-09-20T06:00:00.000000000', '2020-09-20T06:59:59.999997000'], dtype='datetime64[ns]') Coordinates: * time (time) datetime64[ns] 2020-09-20 ... 2020-09-20T06:59:59.999997 Attributes: grads_dim: t grads_mapping: linear grads_size: 121 grads_min: 00z20sep2020 grads_step: 1hr long_name: time minimum: 00z20sep2020 maximum: 00z25sep2020 resolution: 0.041666668 ```

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Problem decoding times in data from OpenDAP server 701062999

Advanced export

JSON shape: default, array, newline-delimited, object

CSV options:

CREATE TABLE [issue_comments] (
   [html_url] TEXT,
   [issue_url] TEXT,
   [id] INTEGER PRIMARY KEY,
   [node_id] TEXT,
   [user] INTEGER REFERENCES [users]([id]),
   [created_at] TEXT,
   [updated_at] TEXT,
   [author_association] TEXT,
   [body] TEXT,
   [reactions] TEXT,
   [performed_via_github_app] TEXT,
   [issue] INTEGER REFERENCES [issues]([id])
);
CREATE INDEX [idx_issue_comments_issue]
    ON [issue_comments] ([issue]);
CREATE INDEX [idx_issue_comments_user]
    ON [issue_comments] ([user]);
Powered by Datasette · Queries took 43.122ms · About: xarray-datasette