issue_comments
5 rows where author_association = "MEMBER", issue = 701062999 and user = 6628425 sorted by updated_at descending
This data as json, CSV (advanced)
Suggested facets: created_at (date), updated_at (date)
issue 1
- Problem decoding times in data from OpenDAP server · 5 ✖
id | html_url | issue_url | node_id | user | created_at | updated_at ▲ | author_association | body | reactions | performed_via_github_app | issue |
---|---|---|---|---|---|---|---|---|---|---|---|
696678072 | https://github.com/pydata/xarray/issues/4422#issuecomment-696678072 | https://api.github.com/repos/pydata/xarray/issues/4422 | MDEyOklzc3VlQ29tbWVudDY5NjY3ODA3Mg== | spencerkclark 6628425 | 2020-09-22T12:05:58Z | 2020-09-22T12:05:58Z | MEMBER | It's a little more delicate, but I also think we should be able to fix this in the case of |
{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
Problem decoding times in data from OpenDAP server 701062999 | |
696674560 | https://github.com/pydata/xarray/issues/4422#issuecomment-696674560 | https://api.github.com/repos/pydata/xarray/issues/4422 | MDEyOklzc3VlQ29tbWVudDY5NjY3NDU2MA== | spencerkclark 6628425 | 2020-09-22T11:58:31Z | 2020-09-22T11:58:31Z | MEMBER | @dopplershift I agree an error doesn't really make sense, particularly since these units appear in the CF conventions -- thanks for pointing that out. I hesitate to say things are working perfectly with |
{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
Problem decoding times in data from OpenDAP server 701062999 | |
696451333 | https://github.com/pydata/xarray/issues/4422#issuecomment-696451333 | https://api.github.com/repos/pydata/xarray/issues/4422 | MDEyOklzc3VlQ29tbWVudDY5NjQ1MTMzMw== | spencerkclark 6628425 | 2020-09-22T00:29:32Z | 2020-09-22T00:32:04Z | MEMBER |
To be honest I'm not sure. You could give it a try -- the only thing there is that you'd need to cast the times back to
In the test case I think we're merely acknowledging that the remaining part of the test would fail with these units. I agree with you and @dcherian though -- it's super weird that pandas behaves the way it does here. Considering this isn't the first time these units have come up, it might be worth special-casing them in some way. |
{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
Problem decoding times in data from OpenDAP server 701062999 | |
695787482 | https://github.com/pydata/xarray/issues/4422#issuecomment-695787482 | https://api.github.com/repos/pydata/xarray/issues/4422 | MDEyOklzc3VlQ29tbWVudDY5NTc4NzQ4Mg== | spencerkclark 6628425 | 2020-09-20T13:29:48Z | 2020-09-20T13:29:48Z | MEMBER | I don't know if there is anything we can do to make this less opaque. My gut feeling is to label this as a "metadata issue," and recommend addressing it at the file level, but it is awkward that it sort of works as is, but not quite. This is not the first time an issue has been raised related to units like this: https://github.com/pydata/xarray/blob/13c09dc28ec8ff791c6d87e2d8e80c362c65ffd4/xarray/tests/test_coding_times.py#L112-L116 |
{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
Problem decoding times in data from OpenDAP server 701062999 | |
695786924 | https://github.com/pydata/xarray/issues/4422#issuecomment-695786924 | https://api.github.com/repos/pydata/xarray/issues/4422 | MDEyOklzc3VlQ29tbWVudDY5NTc4NjkyNA== | spencerkclark 6628425 | 2020-09-20T13:23:58Z | 2020-09-20T13:23:58Z | MEMBER | Description of the problemI believe the issue here stems from the units attribute in the original dataset: ``` In [1]: import xarray as xr In [2]: url = "https://nomads.ncep.noaa.gov/dods/gfs_0p25_1hr/gfs20200920/gfs_0p25_1hr_00z" In [3]: ds = xr.open_dataset(url, decode_times=False) In [4]: ds.time.attrs["units"] Out[4]: 'days since 1-1-1 00:00:0.0' ``` This is an unusual format -- ordinarily we'd expect zero-padded year, month, and day values. Pandas misinterprets this and parses the reference date to 2001-01-01: ``` In [5]: import pandas as pd In [6]: pd.Timestamp("1-1-1 00:00:0.0") Out[6]: Timestamp('2001-01-01 00:00:00') ``` Of course, with time values on the order 700000 and units of days, this results in dates outside the nanosecond-precision range of the There's a catch though. When saving the dates back out to a file, the odd units remain in the encoding of the time variable. When parsing the reference date, xarray again first tries using pandas. This time, there's nothing that stops xarray from proceeding, because we are no longer bound by integer overflow (taking the difference between a date in 2020 and a date in 2001 is perfectly valid for nanosecond-precision dates). So encoding succeeds, and we no longer need to try with cftime. ``` In [7]: ds = xr.decode_cf(ds) In [8]: subset = ds["ugrd10m"].isel(time=slice(0, 8)) In [9]: subset.to_netcdf("test.nc") In [10]: recovered = xr.open_dataset("test.nc", decode_times=False) In [11]: recovered.time.attrs["units"] Out[11]: 'days since 2001-01-01' ``` Thus when we read the file the first time, decoding happens with cftime and when we read the file the second time, decoding happens with pandas (encoding was also different for the two files). This is the reason for the difference in values. WorkaroundTo get an accurate round-trip I would recommend overwriting the units attribute to something that pandas parses correctly: ``` In [12]: ds = xr.open_dataset(url, decode_times=False) In [13]: ds.time.attrs["units"] = "days since 0001-01-01" In [14]: ds = xr.decode_cf(ds) In [15]: subset = ds["ugrd10m"].isel(time=slice(0, 8)) In [16]: subset.to_netcdf("test.nc") In [17]: recovered = xr.open_dataset("test.nc") In [18]: recovered.time Out[18]: <xarray.DataArray 'time' (time: 8)> array(['2020-09-20T00:00:00.000000000', '2020-09-20T00:59:59.999997000', '2020-09-20T02:00:00.000003000', '2020-09-20T03:00:00.000000000', '2020-09-20T03:59:59.999997000', '2020-09-20T05:00:00.000003000', '2020-09-20T06:00:00.000000000', '2020-09-20T06:59:59.999997000'], dtype='datetime64[ns]') Coordinates: * time (time) datetime64[ns] 2020-09-20 ... 2020-09-20T06:59:59.999997 Attributes: grads_dim: t grads_mapping: linear grads_size: 121 grads_min: 00z20sep2020 grads_step: 1hr long_name: time minimum: 00z20sep2020 maximum: 00z25sep2020 resolution: 0.041666668 ``` |
{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
Problem decoding times in data from OpenDAP server 701062999 |
Advanced export
JSON shape: default, array, newline-delimited, object
CREATE TABLE [issue_comments] ( [html_url] TEXT, [issue_url] TEXT, [id] INTEGER PRIMARY KEY, [node_id] TEXT, [user] INTEGER REFERENCES [users]([id]), [created_at] TEXT, [updated_at] TEXT, [author_association] TEXT, [body] TEXT, [reactions] TEXT, [performed_via_github_app] TEXT, [issue] INTEGER REFERENCES [issues]([id]) ); CREATE INDEX [idx_issue_comments_issue] ON [issue_comments] ([issue]); CREATE INDEX [idx_issue_comments_user] ON [issue_comments] ([user]);
user 1