home / github

Menu
  • Search all tables
  • GraphQL API

issue_comments

Table actions
  • GraphQL API for issue_comments

7 rows where author_association = "NONE" and issue = 343659822 sorted by updated_at descending

✎ View and edit SQL

This data as json, CSV (advanced)

Suggested facets: reactions, created_at (date), updated_at (date)

user 4

  • DevDaoud 4
  • magau 1
  • ACHMartin 1
  • psybot-ca 1

issue 1

  • float32 instead of float64 when decoding int16 with scale_factor netcdf var using xarray · 7 ✖

author_association 1

  • NONE · 7 ✖
id html_url issue_url node_id user created_at updated_at ▲ author_association body reactions performed_via_github_app issue
852069023 https://github.com/pydata/xarray/issues/2304#issuecomment-852069023 https://api.github.com/repos/pydata/xarray/issues/2304 MDEyOklzc3VlQ29tbWVudDg1MjA2OTAyMw== ACHMartin 18679148 2021-06-01T12:03:55Z 2021-06-07T20:48:00Z NONE

Dear all and thank you for your work on Xarray,

Link to @magau comment, I have a netcdf with multiple variables in different format (float, short, byte). Using open_mfdataset 'short' and 'byte' are converted in 'float64' (no scaling, but some masking for the float data). It doesn't raise major issue for me, but it is taking plenty of memory space for nothing.

Below an example of the 3 format from (ncdump -h): short total_nobs(time, lat, lon) ; total_nobs:long_name = "Number of SSS in the time interval" ; total_nobs:valid_min = 0s ; total_nobs:valid_max = 10000s ; float pct_var(time, lat, lon) ; pct_var:_FillValue = NaNf ; pct_var:long_name = "Percentage of SSS_variability that is expected to be not explained by the products" ; pct_var:units = "%" ; pct_var:valid_min = 0. ; pct_var:valid_max = 100. ; byte sss_qc(time, lat, lon) ; sss_qc:long_name = "Sea Surface Salinity Quality, 0=Good; 1=Bad" ; sss_qc:valid_min = 0b ; sss_qc:valid_max = 1b ;

And how they appear after opening in as xarray using open_mfdataset: total_nobs (time, lat, lon) float64 dask.array<chunksize=(48, 584, 1388), meta=np.ndarray> pct_var (time, lat, lon) float32 dask.array<chunksize=(48, 584, 1388), meta=np.ndarray> sss_qc (time, lat, lon) float64 dask.array<chunksize=(48, 584, 1388), met

Is there any recommandation? Regards

{
    "total_count": 1,
    "+1": 1,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  float32 instead of float64 when decoding int16 with scale_factor netcdf var using xarray  343659822
731253022 https://github.com/pydata/xarray/issues/2304#issuecomment-731253022 https://api.github.com/repos/pydata/xarray/issues/2304 MDEyOklzc3VlQ29tbWVudDczMTI1MzAyMg== psybot-ca 66918146 2020-11-20T15:59:13Z 2020-11-20T15:59:13Z NONE

Hey everyone, tumbled on this while searching for approximately the same problem. Thought I'd share since the issue is still open. On my part, there is two situations that seem buggy. I haven't been using xarray for that long yet so maybe there is something I'm missing here...

My first problem relates to the data types of dimensions with float notation. To give another answer to @shoyer's question:

To clarify: why is it a problem for you

it is a problem in my case because I would like to perform slicing operations of a dataset using longitude values from another dataset. This operation raises a "KeyError : not all values found in index 'longitude'" since either one of the dataset's longitude is float32 and the other is float64 or because both datasets' float32 approximations are not exactly the same value in each dataset. I can work around this and assign new coords to be float64 after reading and it works, though it is kind of a hassle considering I have to perform this thousands of times. This situation also create a problem when concatenating multiple netCDF files together (along time dim in my case). The discrepancies between the approximations of float32 values or the float32 vs float 64 situation will add new dimension values where it shouldn't.

On the second part of my problem, it comes with writing/reading netCDF files (maybe more related to @daoudjahdou problem). I tried to change the data type to float64 for all my files, save them and then perform what I need to do, but for some reason even though dtype is float64 for all my dimensions when writing the files (using default args), it will sometime be float32, sometime float64 when reading the files (with default ags values) previously saved with float64 dtype. If using the default args, shouldn't the decoding makes the dtype of dimension the same for all files I read?

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  float32 instead of float64 when decoding int16 with scale_factor netcdf var using xarray  343659822
462592638 https://github.com/pydata/xarray/issues/2304#issuecomment-462592638 https://api.github.com/repos/pydata/xarray/issues/2304 MDEyOklzc3VlQ29tbWVudDQ2MjU5MjYzOA== magau 791145 2019-02-12T02:48:00Z 2019-02-12T02:48:00Z NONE

Hi everyone, I've start using xarray recently, so I apologize if I'm saying something wrong... I've also faced the here reported issue, so have tried to find some answers. Unpacking netcdf files with respect to the NUG attributes (scale_factor and add_offset) seems to be mentioned by the CF-Conventions directives. And it's clear about which data type should be applied to the unpacked data. cf-conventions-1.7/packed-data In this chapter you can read that: "If the scale_factor and add_offset attributes are of the same data type as the associated variable, the unpacked data is assumed to be of the same data type as the packed data. However, if the scale_factor and add_offset attributes are of a different data type from the variable (containing the packed data) then the unpacked data should match the type of these attributes". In my opinion this should be the default behavior of the xarray.decode_cf function. Which doesn't invalidate the idea of forcing the unpacked data dtype. However non of the CFScaleOffsetCoder and CFMaskCoder de/encoder classes seems to be according with these CF directives, since the first one doesn't look for the scale_factor or add_offset dtypes, and the second one also changes the unpacked data dtype (maybe because nan values are being used to replace the fill values). Sorry for such an extensive comment, without any solutions proposal... Regards! :+1:

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  float32 instead of float64 when decoding int16 with scale_factor netcdf var using xarray  343659822
451984471 https://github.com/pydata/xarray/issues/2304#issuecomment-451984471 https://api.github.com/repos/pydata/xarray/issues/2304 MDEyOklzc3VlQ29tbWVudDQ1MTk4NDQ3MQ== DevDaoud 971382 2019-01-07T16:04:11Z 2019-01-07T16:04:11Z NONE

Hi, thank you for your effort into making xarray a great library. As mentioned in the issue the discussion went on a PR in order to make xr.open_dataset parametrable. This post is about asking you about recommendations regarding our PR.

In this case we would add a parameter to the open_dataset function called "force_promote" which is a boolean and False by default and thus not mandatory. And then spread that parameter down to the function maybe_promote in dtypes.py Where we say the following:

if dtype.itemsize <= 2 and not force_promote: dtype = np.float32 else: dtype = np.float64

The downside of that is that we somehow pollute the code with a parameter that is used in a specific case.

The second approach would check the value of an environment variable called "XARRAY_FORCE_PROMOTE" if it exists and set to true would force promoting type to float64.

please tells us which approach suits best your vision of xarray.

Regards.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  float32 instead of float64 when decoding int16 with scale_factor netcdf var using xarray  343659822
412492776 https://github.com/pydata/xarray/issues/2304#issuecomment-412492776 https://api.github.com/repos/pydata/xarray/issues/2304 MDEyOklzc3VlQ29tbWVudDQxMjQ5Mjc3Ng== DevDaoud 971382 2018-08-13T11:51:15Z 2018-08-13T11:51:15Z NONE

Any updates about this ?

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  float32 instead of float64 when decoding int16 with scale_factor netcdf var using xarray  343659822
410678021 https://github.com/pydata/xarray/issues/2304#issuecomment-410678021 https://api.github.com/repos/pydata/xarray/issues/2304 MDEyOklzc3VlQ29tbWVudDQxMDY3ODAyMQ== DevDaoud 971382 2018-08-06T11:31:00Z 2018-08-06T11:31:00Z NONE

As mentioned in the original issue the modification is straightforward. Any ideas if this could be integrated to xarray anytime soon ?

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  float32 instead of float64 when decoding int16 with scale_factor netcdf var using xarray  343659822
407092265 https://github.com/pydata/xarray/issues/2304#issuecomment-407092265 https://api.github.com/repos/pydata/xarray/issues/2304 MDEyOklzc3VlQ29tbWVudDQwNzA5MjI2NQ== DevDaoud 971382 2018-07-23T15:10:13Z 2018-07-23T15:10:13Z NONE

Thank you for your quick answer. In our case we could evaluate std dev or square sums on long lists of values and the accumulation of those small values due to float32 type could create considerable differences.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  float32 instead of float64 when decoding int16 with scale_factor netcdf var using xarray  343659822

Advanced export

JSON shape: default, array, newline-delimited, object

CSV options:

CREATE TABLE [issue_comments] (
   [html_url] TEXT,
   [issue_url] TEXT,
   [id] INTEGER PRIMARY KEY,
   [node_id] TEXT,
   [user] INTEGER REFERENCES [users]([id]),
   [created_at] TEXT,
   [updated_at] TEXT,
   [author_association] TEXT,
   [body] TEXT,
   [reactions] TEXT,
   [performed_via_github_app] TEXT,
   [issue] INTEGER REFERENCES [issues]([id])
);
CREATE INDEX [idx_issue_comments_issue]
    ON [issue_comments] ([issue]);
CREATE INDEX [idx_issue_comments_user]
    ON [issue_comments] ([user]);
Powered by Datasette · Queries took 16.243ms · About: xarray-datasette