home / github / issues

Menu
  • GraphQL API
  • Search all tables

issues: 233992696

This data as json

id node_id number title user state locked assignee milestone comments created_at updated_at closed_at author_association active_lock_reason draft pull_request body reactions performed_via_github_app state_reason repo type
233992696 MDU6SXNzdWUyMzM5OTI2OTY= 1444 Best practice when the _Unsigned attribute is present in NetCDF files 1325771 closed 0     5 2017-06-06T19:05:07Z 2017-07-28T17:39:04Z 2017-07-28T17:39:04Z CONTRIBUTOR      

Some (large) data providers are writing NetCDF-4-extended files but using an _Unsigned attribute to indicate that a signed data type should be interpreted as unsigned bytes.

Background: https://github.com/Unidata/netcdf4-python/issues/656

From the background discussion above, it is my understanding that xarray does not honor the attribute because it’s not a part of the CF spec, is only mentioned as a proposed attribute in the NetCDF Best Practices, and because "xarray wants the Variable dtype to be the same as the dtype of the data returned."

Taking the above as a given, it is necessary for xarray users encountering such variables to do the following after reading the data:

dtype = data.encoding['dtype'].str.replace('i', 'u') scale_factor = data.encoding['scale_factor'] add_offset = data.encoding['add_offset'] unscale = ((data - add_offset)/scale_factor).data.astype(dtype).astype('float64') fixed = unscale * scale_factor + add_offset

The un-scaling step can be saved by turning off auto mask and scale.

In order to automate the above process while still being able to use the functionality of Dataset, one approach might be to automatically perform the above steps on some known list of variables, and then reassign those variables to the Dataset. The downside is the need to read all variables up front, which could be expensive when processing large datasets where not all variables are needed.

Is there another approach that would preserve lazy data loading, for instance by providing pre/post hooks for transformation functions at the __getitem__ stage? Is there something I could do to help document that as a best practice?

{
    "url": "https://api.github.com/repos/pydata/xarray/issues/1444/reactions",
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  completed 13221727 issue

Links from other tables

  • 2 rows from issues_id in issues_labels
  • 5 rows from issue in issue_comments
Powered by Datasette · Queries took 0.555ms · About: xarray-datasette