home / github / issue_comments

Menu
  • Search all tables
  • GraphQL API

issue_comments: 1522997083

This data as json

html_url issue_url id node_id user created_at updated_at author_association body reactions performed_via_github_app issue
https://github.com/pydata/xarray/issues/7782#issuecomment-1522997083 https://api.github.com/repos/pydata/xarray/issues/7782 1522997083 IC_kwDOAMm_X85axxdb 5821660 2023-04-26T08:28:39Z 2023-04-26T08:28:39Z MEMBER

This is how netCDF4-python handles this data with different parameters:

python import netCDF4 as nc with nc.Dataset("http://dap.ceda.ac.uk/thredds/dodsC/neodc/esacci/snow/data/scfv/MODIS/v2.0/2010/01/20100101-ESACCI-L3C_SNOW-SCFV-MODIS_TERRA-fv2.0.nc") as ds_dap: v = ds_dap["scfv"] print(v) print("\n- default") print(f"variable dtype: {v.dtype}") print(f"first 2 elements: {v[0, 0, :2].dtype} {v[0, 0, :2]}") print(f"last 2 elements: {v[0, 0, -2:].dtype} {v[0, 0, -2:]}") print("\n- maskandscale False") ds_dap.set_auto_maskandscale(False) v = ds_dap["scfv"] print(f"variable dtype: {v.dtype}") print(f"first 2 elements: {v[0, 0, :2].dtype} {v[0, 0, :2]}") print(f"last 2 elements: {v[0, 0, -2:].dtype} {v[0, 0, -2:]}") print("\n- mask/scale False") ds_dap.set_auto_mask(False) ds_dap.set_auto_scale(False) v = ds_dap["scfv"] print(f"variable dtype: {v.dtype}") print(f"first 2 elements: {v[0, 0, :2].dtype} {v[0, 0, :2]}") print(f"last 2 elements: {v[0, 0, -2:].dtype} {v[0, 0, -2:]}") print("\n- mask True / scale False") ds_dap.set_auto_mask(True) ds_dap.set_auto_scale(False) v = ds_dap["scfv"] print(f"variable dtype: {v.dtype}") print(f"first 2 elements: {v[0, 0, :2].dtype} {v[0, 0, :2]}") print(f"last 2 elements: {v[0, 0, -2:].dtype} {v[0, 0, -2:]}") print("\n- mask False / scale True") ds_dap.set_auto_mask(False) ds_dap.set_auto_scale(True) v = ds_dap["scfv"] print(f"variable dtype: {v.dtype}") print(f"first 2 elements: {v[0, 0, :2].dtype} {v[0, 0, :2]}") print(f"last 2 elements: {v[0, 0, -2:].dtype} {v[0, 0, -2:]}") print("\n- mask True / scale True") ds_dap.set_auto_mask(True) ds_dap.set_auto_scale(True) v = ds_dap["scfv"] print(f"variable dtype: {v.dtype}") print(f"first 2 elements: {v[0, 0, :2].dtype} {v[0, 0, :2]}") print(f"last 2 elements: {v[0, 0, -2:].dtype} {v[0, 0, -2:]}") print("\n- maskandscale True") ds_dap.set_auto_mask(False) ds_dap.set_auto_scale(False) ds_dap.set_auto_maskandscale(True) v = ds_dap["scfv"] print(f"variable dtype: {v.dtype}") print(f"first 2 elements: {v[0, 0, :2].dtype} {v[0, 0, :2]}") print(f"last 2 elements: {v[0, 0, -2:].dtype} {v[0, 0, -2:]}") ```python <class 'netCDF4._netCDF4.Variable'> int8 scfv(time, lat, lon) _Unsigned: true _FillValue: -1 standard_name: snow_area_fraction_viewable_from_above long_name: Snow Cover Fraction Viewable units: percent valid_range: [ 0 -2] actual_range: [ 0 100] flag_values: [-51 -50 -46 -41 -4 -3 -2] flag_meanings: Cloud Polar_Night_or_Night Water Permanent_Snow_and_Ice Classification_failed Input_Data_Error No_Satellite_Acquisition missing_value: -1 ancillary_variables: scfv_unc grid_mapping: spatial_ref _ChunkSizes: [ 1 1385 2770] unlimited dimensions: time current shape = (1, 18000, 36000) filling off

  • default variable dtype: int8 first 2 elements: uint8 [215 215] last 2 elements: uint8 [215 215]

  • maskandscale False variable dtype: int8 first 2 elements: int8 [-41 -41] last 2 elements: int8 [-41 -41]

  • mask/scale False variable dtype: int8 first 2 elements: int8 [-41 -41] last 2 elements: int8 [-41 -41]

  • mask True / scale False variable dtype: int8 first 2 elements: int8 [-- --] last 2 elements: int8 [-- --]

  • mask False / scale True variable dtype: int8 first 2 elements: uint8 [215 215] last 2 elements: uint8 [215 215]

  • mask True / scale True variable dtype: int8 first 2 elements: uint8 [215 215] last 2 elements: uint8 [215 215]

  • maskandscale True variable dtype: int8 first 2 elements: uint8 [215 215] last 2 elements: uint8 [215 215] ```

First, the dataset was created with filling off (read more about that in the netcdf file format specs https://docs.unidata.ucar.edu/netcdf-c/current/file_format_specifications.html). This should not be a problem for the analysis, but it tells us that all data points should have been written to somehow.

As we can see from the above output, in netCDF4-python scaling is adapting the dtype to unsigned, not masking. This is also reflected in the docs https://unidata.github.io/netcdf4-python/#Variable.

If Xarray is trying to align with netCDF4-python it should separate mask and scale as netCDF4-python is doing. It does this already by using different coders but it doesn't separate it API-wise.

We would need a similar approach here for Xarray with additional kwargs scale and mask in addition to mask_and_scale. We cannot just move the UnsignedCoder out of mask_and_scale and apply it unconditionally.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  1681353195
Powered by Datasette · Queries took 0.756ms · About: xarray-datasette