home / github

Menu
  • Search all tables
  • GraphQL API

issue_comments

Table actions
  • GraphQL API for issue_comments

7 rows where issue = 1309966595 sorted by updated_at descending

✎ View and edit SQL

This data as json, CSV (advanced)

Suggested facets: reactions, created_at (date), updated_at (date)

user 2

  • mankoff 4
  • dcherian 3

author_association 2

  • CONTRIBUTOR 4
  • MEMBER 3

issue 1

  • Improved CF decoding · 7 ✖
id html_url issue_url node_id user created_at updated_at ▲ author_association body reactions performed_via_github_app issue
1493004758 https://github.com/pydata/xarray/pull/6812#issuecomment-1493004758 https://api.github.com/repos/pydata/xarray/issues/6812 IC_kwDOAMm_X85Y_XHW dcherian 2448579 2023-04-01T15:26:04Z 2023-04-01T15:26:04Z MEMBER

We should figure out how to express some of this understanding as tests (some xfailed). That way it's easy to check when something gets fixed, and prevent regressions.

{
    "total_count": 1,
    "+1": 1,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Improved CF decoding 1309966595
1281138042 https://github.com/pydata/xarray/pull/6812#issuecomment-1281138042 https://api.github.com/repos/pydata/xarray/issues/6812 IC_kwDOAMm_X85MXJ16 dcherian 2448579 2022-10-17T16:27:22Z 2022-10-17T16:27:22Z MEMBER

A big decision is if the decode option strictly follows CF guidelines.

I think our general position is to be flexible on what we can read because there are many slightly non-compliant files out there.

Xarray has written 10s (100s?) of tests that touch this decoding function that make assumptions that I believe are incorrect after a careful reading of the CF spec.

Some of these might just be for convenience and some might be checking that we are flexible in what we can read.

This following test should be preserved so we can read those files (#4631): @pytest.mark.parametrize("scale_factor", (10, [10])) @pytest.mark.parametrize("add_offset", (0.1, [0.1]))

Enforcing this would probably break xarray backward compatibility for writing files.

Do we not enforce that scale_factor and add_offset are of the same dtype on write? If so, we should consider that a bug and fix it.

I am concerned that this is a significant change and I'm not sure what the process is for making this change.

I think the way to move forward would be to figure out the smallest change that would fix (or even improve) #2304 and move on. We have a 30-minute bi-weekly meeting (#4001) that you're welcomed to attend and raise specific questions. The next one is Oct 26 at 9.30am Mountain Time

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Improved CF decoding 1309966595
1269235342 https://github.com/pydata/xarray/pull/6812#issuecomment-1269235342 https://api.github.com/repos/pydata/xarray/issues/6812 IC_kwDOAMm_X85Lpv6O mankoff 145117 2022-10-06T02:48:22Z 2022-10-06T02:48:22Z CONTRIBUTOR

A bit more detail about the existing tests that don't match the CF spec. Per the spec, scale_factor and add_offset should be of the same type. That causes tests throughout https://github.com/pydata/xarray/blob/main/xarray/tests/test_coding.py and https://github.com/pydata/xarray/blob/main/xarray/tests/test_backends.py to fail, because:

https://github.com/pydata/xarray/blob/13c52b27b777709fc3316cf4334157f50904c02b/xarray/tests/test_coding.py#L112-L113

There is 1 test in test_coding, and 9 tests in test_backends that use mixed types. That's a tractable number I can fix.

In addition, the expected dtype returned by many of the tests does not match (my interpretation of) the expected dtype per the CF spec.

I am concerned that this is a significant change and I'm not sure what the process is for making this change. I would like to have some idea, even if not a guarantee, that it would be welcomed and accepted before doing all the work. I note that a recent other large PR to try to fix cf decoding has also stalled, and I'm not sure why (see #2751)

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Improved CF decoding 1309966595
1266136366 https://github.com/pydata/xarray/pull/6812#issuecomment-1266136366 https://api.github.com/repos/pydata/xarray/issues/6812 IC_kwDOAMm_X85Ld7Uu mankoff 145117 2022-10-03T22:29:28Z 2022-10-03T22:29:28Z CONTRIBUTOR

Hi @dcherian - I dropped this because I went down a rabbit hole that seemed very very deep.

Xarray has written 10s (100s?) of tests that touch this decoding function that make assumptions that I believe are incorrect after a careful reading of the CF spec. I believe the path forward will take some conversation before coding, so perhaps this should be moved to an issue rather than a pull request? A big decision is if the decode option strictly follows CF guidelines. If so, then a lot of tests need to be changed (for example, to follow the simple rule of [scale_factor and add_offset] must both be of type float or both be of type double).

Enforcing this would probably break xarray backward compatibility for writing files. I assume that that may be OK and there are processes to handle this (start with 'deprecation' warnings, then eventually throw errors?). There are also likely many NetCDF files that are not standard compliant and we need to decide how to read them.

Furthermore, the CF conventions are themselves not very clear, and possibly ambiguous. I started a conversation here: https://github.com/cf-convention/cf-conventions/issues/374 on this, but that is also unresolved at the moment. The CF convention mentions int and float, but not how many bytes those are. What happens when a files is written & packed on one architecture and read & unpacked on another?

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Improved CF decoding 1309966595
1266044523 https://github.com/pydata/xarray/pull/6812#issuecomment-1266044523 https://api.github.com/repos/pydata/xarray/issues/6812 IC_kwDOAMm_X85Ldk5r dcherian 2448579 2022-10-03T21:01:43Z 2022-10-03T21:01:43Z MEMBER

Sorry for dropping this @mankoff How can we move forward here?

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Improved CF decoding 1309966595
1189512813 https://github.com/pydata/xarray/pull/6812#issuecomment-1189512813 https://api.github.com/repos/pydata/xarray/issues/6812 IC_kwDOAMm_X85G5oZt mankoff 145117 2022-07-19T20:19:29Z 2022-07-19T20:19:29Z CONTRIBUTOR

I'm reading more in https://github.com/pydata/xarray/blob/2a5686c6fe855502523e495e43bd381d14191c7b/xarray/coding/variables.py and I'm confused about some logic:

https://github.com/pydata/xarray/blob/2a5686c6fe855502523e495e43bd381d14191c7b/xarray/coding/variables.py#L271-L272

pop_to does a pop operation - it removes the key/value pair. So line 1 above will remove add_offset from attrs if it exists. The second line then checks for "add_offset" in attrs which should always be False.

I think this is happening based on inspecting with the debugger.

Furthermore, the fix I implemented in this Pull Request which returns np.float64 fixes my bug, but only because this bug exists. My dataset has add_offset, so the lines I changed:

python if not has_offset: return np.float64

should not run, but do run because of this issue.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Improved CF decoding 1309966595
1189485451 https://github.com/pydata/xarray/pull/6812#issuecomment-1189485451 https://api.github.com/repos/pydata/xarray/issues/6812 IC_kwDOAMm_X85G5huL mankoff 145117 2022-07-19T19:46:23Z 2022-07-19T19:46:23Z CONTRIBUTOR

Note - I also have not run the "Running the performance test suite" code in https://xarray.pydata.org/en/stable/contributing.html - I assume changing from float32 to float64 would impact performance. I can run that if suggested.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Improved CF decoding 1309966595

Advanced export

JSON shape: default, array, newline-delimited, object

CSV options:

CREATE TABLE [issue_comments] (
   [html_url] TEXT,
   [issue_url] TEXT,
   [id] INTEGER PRIMARY KEY,
   [node_id] TEXT,
   [user] INTEGER REFERENCES [users]([id]),
   [created_at] TEXT,
   [updated_at] TEXT,
   [author_association] TEXT,
   [body] TEXT,
   [reactions] TEXT,
   [performed_via_github_app] TEXT,
   [issue] INTEGER REFERENCES [issues]([id])
);
CREATE INDEX [idx_issue_comments_issue]
    ON [issue_comments] ([issue]);
CREATE INDEX [idx_issue_comments_user]
    ON [issue_comments] ([user]);
Powered by Datasette · Queries took 15.261ms · About: xarray-datasette