home / github

Menu
  • Search all tables
  • GraphQL API

issue_comments

Table actions
  • GraphQL API for issue_comments

8 rows where issue = 924676925 sorted by updated_at descending

✎ View and edit SQL

This data as json, CSV (advanced)

Suggested facets: created_at (date), updated_at (date)

user 4

  • kmuehlbauer 3
  • d70-t 2
  • lthUniBonn 2
  • keewis 1

author_association 3

  • MEMBER 4
  • CONTRIBUTOR 2
  • NONE 2

issue 1

  • Nan/ changed values in output when only reading data, saving and reading again · 8 ✖
id html_url issue_url node_id user created_at updated_at ▲ author_association body reactions performed_via_github_app issue
1531496369 https://github.com/pydata/xarray/issues/5490#issuecomment-1531496369 https://api.github.com/repos/pydata/xarray/issues/5490 IC_kwDOAMm_X85bSMex kmuehlbauer 5821660 2023-05-02T13:38:49Z 2023-05-02T13:38:49Z MEMBER

This is indeed an issue with scale_factor and add_offset as @d70-t has already mentioned.

That is not a problem per se, but those attributes are obviously different for different files. When concatenating only the first files's attributes survive. That might already be the source of the above problem, as it might slightly change values.

An even bigger problem is, when the dynamic range of the decoded data (min/max) doesn't overlap. Then the data might be folded from the lower border to the upper border or vica versa.

I've put an example into #5739. The suggestion for now is as @keewis comment to drop encoding in such cases and use floating point values for writing. You might use the available compression options for floating point data.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Nan/ changed values in output when only reading data, saving and reading again 924676925
1531465011 https://github.com/pydata/xarray/issues/5490#issuecomment-1531465011 https://api.github.com/repos/pydata/xarray/issues/5490 IC_kwDOAMm_X85bSE0z kmuehlbauer 5821660 2023-05-02T13:20:46Z 2023-05-02T13:20:46Z MEMBER

Xref: #5739

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Nan/ changed values in output when only reading data, saving and reading again 924676925
864386633 https://github.com/pydata/xarray/issues/5490#issuecomment-864386633 https://api.github.com/repos/pydata/xarray/issues/5490 MDEyOklzc3VlQ29tbWVudDg2NDM4NjYzMw== kmuehlbauer 5821660 2021-06-19T10:18:21Z 2021-06-19T10:18:21Z MEMBER

@lthUniBonn You would need to use kwarg mask_and_scale=False in the call to open_dataset. decode_cf=False will work too, but it completely disables any cf decoding.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Nan/ changed values in output when only reading data, saving and reading again 924676925
864374967 https://github.com/pydata/xarray/issues/5490#issuecomment-864374967 https://api.github.com/repos/pydata/xarray/issues/5490 MDEyOklzc3VlQ29tbWVudDg2NDM3NDk2Nw== lthUniBonn 56541075 2021-06-19T08:22:15Z 2021-06-19T08:28:04Z NONE

Probably the scaling and adding is carried out in float64, but then rounded down to float32. This refers to how xarray reads the netcdf (and not something to do with the original data), right?

Is there a way to avoid this by not scaling/adding in the first place? If only the integer values were read, selected by index and saved again this should then not happen anymore, right? I could try decode_cf=False for this...

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Nan/ changed values in output when only reading data, saving and reading again 924676925
864131761 https://github.com/pydata/xarray/issues/5490#issuecomment-864131761 https://api.github.com/repos/pydata/xarray/issues/5490 MDEyOklzc3VlQ29tbWVudDg2NDEzMTc2MQ== keewis 14808389 2021-06-18T15:52:18Z 2021-06-18T15:52:18Z MEMBER

related to that there's also #5082 which proposes to drop the encoding more aggressively.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Nan/ changed values in output when only reading data, saving and reading again 924676925
863972083 https://github.com/pydata/xarray/issues/5490#issuecomment-863972083 https://api.github.com/repos/pydata/xarray/issues/5490 MDEyOklzc3VlQ29tbWVudDg2Mzk3MjA4Mw== d70-t 6574622 2021-06-18T11:32:38Z 2021-06-18T11:33:14Z CONTRIBUTOR

I've checked your example files. This is mostly related to the fact, that the original data is encoded as short and uses scale_factor and add_offset: python In [35]: ds_loc.q.encoding Out[35]: {'source': '/private/tmp/test_xarray/Minimal_test_data/2012_europe_9_130_131_132_133_135.nc', 'original_shape': (720, 26, 36, 41), 'dtype': dtype('int16'), 'missing_value': -32767, '_FillValue': -32767, 'scale_factor': 3.0672840096982675e-07, 'add_offset': 0.010050721147263318}

Probably the scaling and adding is carried out in float64, but then rounded down to float32. When storing the dataset back to netCDF, xarray re-uses the information from the encoding attribute and goes back to int16, possibly creating even more rounding errors. Reading the data back in is then not reproducible anymore.

Possibly related issues are #4826 and #3020

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Nan/ changed values in output when only reading data, saving and reading again 924676925
863955559 https://github.com/pydata/xarray/issues/5490#issuecomment-863955559 https://api.github.com/repos/pydata/xarray/issues/5490 MDEyOklzc3VlQ29tbWVudDg2Mzk1NTU1OQ== lthUniBonn 56541075 2021-06-18T11:03:18Z 2021-06-18T11:03:53Z NONE

Yes, they are generated on a .25x.25 lat lon grid in europe, so these values match (when reading the original files there is no nan, which I think excludes this option)

The test is all q values are the same is not meant for the case where I even find nan, but where I don't see them. I should have included the output I get - see below e.q. for the last test I ran.

It say that both original and read back in are F32 - that's what confuses me. I also expected to see a difference in data type to be responsible, but at first glance here it does not seem to be the case.

Below that output I print a timespan of the original and the second dataset, where the values clearly differ - in the last few digits. I can also include the test, where it even returns nan at some places. The full testing code and data is in the link if you want to see that - or I can post it here. ``` original <xarray.Dataset> Dimensions: (level: 26, time: 1464) Coordinates: longitude float32 10.0 latitude float32 38.0 * level (level) int32 112 113 114 115 116 117 ... 132 133 134 135 136 137 * time (time) datetime64[ns] 2014-09-01 ... 2014-10-31T23:00:00 Data variables: t (time, level) float32 dask.array<chunksize=(720, 26), meta=np.ndarray> q (time, level) float32 dask.array<chunksize=(720, 26), meta=np.ndarray> u (time, level) float32 dask.array<chunksize=(720, 26), meta=np.ndarray> v (time, level) float32 dask.array<chunksize=(720, 26), meta=np.ndarray> Attributes: Conventions: CF-1.6 history: 2021-02-05 01:00:40 GMT by grib_to_netcdf-2.16.0: /opt/ecmw...

read back in <xarray.Dataset> Dimensions: (level: 26, time: 1464) Coordinates: longitude float32 ... latitude float32 ... * level (level) int32 112 113 114 115 116 117 ... 132 133 134 135 136 137 * time (time) datetime64[ns] 2014-09-01 ... 2014-10-31T23:00:00 Data variables: t (time, level) float32 ... q (time, level) float32 ... u (time, level) float32 ... v (time, level) float32 ... Attributes: Conventions: CF-1.6 history: 2021-02-05 01:00:40 GMT by grib_to_netcdf-2.16.0: /opt/ecmw...


test for nan - np.any(np.isnan(array)) original q: False t: False u: False v: False read back in q: False t: False u: False v: False


look at one of the problematic portions: (q.values[timespan] - values for same timespan original and read back in) original [0.01286593 0.01290165 0.01218289 0.01229404 0.01238789 0.0125237 0.01275251 0.01274316 0.01292717 0.01308822 0.01309219 0.01304683 0.01299834 0.01299749 0.01267057 0.01274089 0.01281064 0.01282141 0.01286848 0.01291271 0.01302868 0.01290676 0.01276612 0.01273976 0.01273635 0.01271169 0.01244998 0.01250867 0.01229999 0.01256708 0.01265356 0.01276471 0.01274259 0.01243155 0.01195124 0.01166572 0.01124779 0.01097304 0.01091747 0.01098779 0.01105896 0.01114317 0.01122823 0.01133569 0.01147207 0.01155231 0.01154834 0.01154579 0.01155486 0.01158009 0.0114715 0.01169464 0.01170598 0.01151034 0.01124751 0.01127246 0.01125374 0.01128862 0.01127643 0.0112631 0.01126225 0.01126594 0.01154182 0.01162574 0.01169833 0.01176354 0.01183301 0.01184066 0.01187781 0.01194756 0.01208564 0.01224102 0.01244346 0.01260706 0.01236549 0.01256538 0.0127528 0.01287415 0.01304286 0.01327876 0.01366919 0.01396406 0.0142683 0.01445004 0.01449626 0.01438228 0.01404204 0.01419486 0.01447329 0.01472309 0.01493943 0.01512514 0.01532986 0.01552691 0.01566074 0.01577302 0.01581669 0.015832 0.01564515 0.01568768] read back in [0.01286582 0.01290182 0.01218301 0.01229396 0.01238785 0.01252367 0.01275264 0.01274299 0.01292705 0.01308811 0.01309219 0.01304692 0.0129983 0.01299756 0.01267063 0.01274076 0.01281053 0.01282129 0.01286842 0.01291258 0.01302873 0.01290664 0.012766 0.01273965 0.01273631 0.01271182 0.01244982 0.01250883 0.0122999 0.01256709 0.01265355 0.01276488 0.01274262 0.01243164 0.01195107 0.0116657 0.01124785 0.01097287 0.01091757 0.01098771 0.01105896 0.0111432 0.01122818 0.0113358 0.01147199 0.01155215 0.01154844 0.01154584 0.01155474 0.01157998 0.01147162 0.01169465 0.01170615 0.01151021 0.01124748 0.01127234 0.01125379 0.01128867 0.01127642 0.01126306 0.01126232 0.01126603 0.01154175 0.01162562 0.01169836 0.01176367 0.01183306 0.01184049 0.01187797 0.01194773 0.01208578 0.0122409 0.01244352 0.01260717 0.01236559 0.01256523 0.01275264 0.01287398 0.01304283 0.01327885 0.01366924 0.01396389 0.01426819 0.01445002 0.01449641 0.01438211 0.01404219 0.01419471 0.0144734 0.01472315 0.0149395 0.01512505 0.01532989 0.01552694 0.01566091 0.01577298 0.01581677 0.01583198 0.01564532 0.01568763] timespan: 2014-10-04T08:00:00.000000000 2014-10-08T11:00:00.000000000 Test all q values same: False

```

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Nan/ changed values in output when only reading data, saving and reading again 924676925
863945975 https://github.com/pydata/xarray/issues/5490#issuecomment-863945975 https://api.github.com/repos/pydata/xarray/issues/5490 MDEyOklzc3VlQ29tbWVudDg2Mzk0NTk3NQ== d70-t 6574622 2021-06-18T10:44:38Z 2021-06-18T10:44:38Z CONTRIBUTOR

Are your input files on (exactly) the same grid? If not, combining the files might introduce NaN to fill up missmatching cells. Furthemore, if you are working with NaNs, are you aware of: ```python In [1]: import numpy as np

In [2]: np.nan == np.nan Out[2]: False ``` Which is as it should be per IEEE 754.

When writing out the files to netCDF, do you accidentally convert from 64bit float to 32bit float?

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Nan/ changed values in output when only reading data, saving and reading again 924676925

Advanced export

JSON shape: default, array, newline-delimited, object

CSV options:

CREATE TABLE [issue_comments] (
   [html_url] TEXT,
   [issue_url] TEXT,
   [id] INTEGER PRIMARY KEY,
   [node_id] TEXT,
   [user] INTEGER REFERENCES [users]([id]),
   [created_at] TEXT,
   [updated_at] TEXT,
   [author_association] TEXT,
   [body] TEXT,
   [reactions] TEXT,
   [performed_via_github_app] TEXT,
   [issue] INTEGER REFERENCES [issues]([id])
);
CREATE INDEX [idx_issue_comments_issue]
    ON [issue_comments] ([issue]);
CREATE INDEX [idx_issue_comments_user]
    ON [issue_comments] ([user]);
Powered by Datasette · Queries took 18.334ms · About: xarray-datasette