home / github / issue_comments

Menu
  • Search all tables
  • GraphQL API

issue_comments: 1498464352

This data as json

html_url issue_url id node_id user created_at updated_at author_association body reactions performed_via_github_app issue
https://github.com/pydata/xarray/issues/7723#issuecomment-1498464352 https://api.github.com/repos/pydata/xarray/issues/7723 1498464352 IC_kwDOAMm_X85ZUMBg 5821660 2023-04-06T04:09:11Z 2023-04-06T04:09:11Z MEMBER

@dcherian Great, a duplicate. :-( Sorry I must have overlooked that one.

It's somewhat counter-intuitive to get differing results when using netcdf4-python and xarray. Would be a good idea to document this behaviour.

It looks like it might at least be resolved for floating point source data:

Let's take the above simple example. We have np.nan written to the file, but the netcdf representation on disk uses a default (undeclared by attribute) _FillValue for unwritten parts.

For the netcdf4-python user the np.nan will not be masked, but the unfilled parts will be masked.

For xarray the default fillvalue won't be masked, appearing as valid data, which it is not. On subsequent writes np.nan will be introduced as the new fillvalue (by attribute), effectively changing the meaning of the default fillvalues.

Wouldn't it make sense then, to transform these default fill values to np.nan on read too, instead of giving the a seemingly meaningful value? Maybe yet another keyword switch, use_default_fillvalues?

There should be at least a warning on read, in these situations, that there are undefined values in the dataset which were never written and which will not be masked.

If the dataset contains unwritten parts, and a default fillvalue is used, in turn meaning the data creator did this by purpose (by not setting a _FillValue) it can mean several things:

  • The creators data does actually not have missing values which need declaring, but it means, that his data will get masked for default fillvalue entries (maybe they doesn't know about this, but that might be unlikely).
  • The creator doesn't care at all, with same conclusion as above.
  • The creator purposefully uses default fillvalue as missing value, since they use this as a means of saving disk space. But this could also be done, by just defining that as _FillValue attribute at creation time, if I`m not mistaken.

I'm still convinced this could be fixed for floating point data.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  1655569401
Powered by Datasette · Queries took 0.749ms · About: xarray-datasette