issue_comments: 1522997083

This data as json

html_url	issue_url	id	node_id	user	created_at	updated_at	author_association	body	reactions	performed_via_github_app	issue
https://github.com/pydata/xarray/issues/7782#issuecomment-1522997083	https://api.github.com/repos/pydata/xarray/issues/7782	1522997083	IC_kwDOAMm_X85axxdb	5821660	2023-04-26T08:28:39Z	2023-04-26T08:28:39Z	MEMBER	This is how netCDF4-python handles this data with different parameters: python import netCDF4 as nc with nc.Dataset("http://dap.ceda.ac.uk/thredds/dodsC/neodc/esacci/snow/data/scfv/MODIS/v2.0/2010/01/20100101-ESACCI-L3C_SNOW-SCFV-MODIS_TERRA-fv2.0.nc") as ds_dap: v = ds_dap["scfv"] print(v) print("\n- default") print(f"variable dtype: {v.dtype}") print(f"first 2 elements: {v[0, 0, :2].dtype} {v[0, 0, :2]}") print(f"last 2 elements: {v[0, 0, -2:].dtype} {v[0, 0, -2:]}") print("\n- maskandscale False") ds_dap.set_auto_maskandscale(False) v = ds_dap["scfv"] print(f"variable dtype: {v.dtype}") print(f"first 2 elements: {v[0, 0, :2].dtype} {v[0, 0, :2]}") print(f"last 2 elements: {v[0, 0, -2:].dtype} {v[0, 0, -2:]}") print("\n- mask/scale False") ds_dap.set_auto_mask(False) ds_dap.set_auto_scale(False) v = ds_dap["scfv"] print(f"variable dtype: {v.dtype}") print(f"first 2 elements: {v[0, 0, :2].dtype} {v[0, 0, :2]}") print(f"last 2 elements: {v[0, 0, -2:].dtype} {v[0, 0, -2:]}") print("\n- mask True / scale False") ds_dap.set_auto_mask(True) ds_dap.set_auto_scale(False) v = ds_dap["scfv"] print(f"variable dtype: {v.dtype}") print(f"first 2 elements: {v[0, 0, :2].dtype} {v[0, 0, :2]}") print(f"last 2 elements: {v[0, 0, -2:].dtype} {v[0, 0, -2:]}") print("\n- mask False / scale True") ds_dap.set_auto_mask(False) ds_dap.set_auto_scale(True) v = ds_dap["scfv"] print(f"variable dtype: {v.dtype}") print(f"first 2 elements: {v[0, 0, :2].dtype} {v[0, 0, :2]}") print(f"last 2 elements: {v[0, 0, -2:].dtype} {v[0, 0, -2:]}") print("\n- mask True / scale True") ds_dap.set_auto_mask(True) ds_dap.set_auto_scale(True) v = ds_dap["scfv"] print(f"variable dtype: {v.dtype}") print(f"first 2 elements: {v[0, 0, :2].dtype} {v[0, 0, :2]}") print(f"last 2 elements: {v[0, 0, -2:].dtype} {v[0, 0, -2:]}") print("\n- maskandscale True") ds_dap.set_auto_mask(False) ds_dap.set_auto_scale(False) ds_dap.set_auto_maskandscale(True) v = ds_dap["scfv"] print(f"variable dtype: {v.dtype}") print(f"first 2 elements: {v[0, 0, :2].dtype} {v[0, 0, :2]}") print(f"last 2 elements: {v[0, 0, -2:].dtype} {v[0, 0, -2:]}") ```python <class 'netCDF4._netCDF4.Variable'> int8 scfv(time, lat, lon) _Unsigned: true _FillValue: -1 standard_name: snow_area_fraction_viewable_from_above long_name: Snow Cover Fraction Viewable units: percent valid_range: [ 0 -2] actual_range: [ 0 100] flag_values: [-51 -50 -46 -41 -4 -3 -2] flag_meanings: Cloud Polar_Night_or_Night Water Permanent_Snow_and_Ice Classification_failed Input_Data_Error No_Satellite_Acquisition missing_value: -1 ancillary_variables: scfv_unc grid_mapping: spatial_ref _ChunkSizes: [ 1 1385 2770] unlimited dimensions: time current shape = (1, 18000, 36000) filling off default variable dtype: int8 first 2 elements: uint8 [215 215] last 2 elements: uint8 [215 215] maskandscale False variable dtype: int8 first 2 elements: int8 [-41 -41] last 2 elements: int8 [-41 -41] mask/scale False variable dtype: int8 first 2 elements: int8 [-41 -41] last 2 elements: int8 [-41 -41] mask True / scale False variable dtype: int8 first 2 elements: int8 [-- --] last 2 elements: int8 [-- --] mask False / scale True variable dtype: int8 first 2 elements: uint8 [215 215] last 2 elements: uint8 [215 215] mask True / scale True variable dtype: int8 first 2 elements: uint8 [215 215] last 2 elements: uint8 [215 215] maskandscale True variable dtype: int8 first 2 elements: uint8 [215 215] last 2 elements: uint8 [215 215] ``` First, the dataset was created with `filling off` (read more about that in the netcdf file format specs https://docs.unidata.ucar.edu/netcdf-c/current/file_format_specifications.html). This should not be a problem for the analysis, but it tells us that all data points should have been written to somehow. As we can see from the above output, in netCDF4-python `scaling` is adapting the dtype to unsigned, not masking. This is also reflected in the docs https://unidata.github.io/netcdf4-python/#Variable. If Xarray is trying to align with netCDF4-python it should separate `mask` and `scale` as netCDF4-python is doing. It does this already by using different coders but it doesn't separate it API-wise. We would need a similar approach here for Xarray with additional kwargs `scale` and `mask` in addition to `mask_and_scale`. We cannot just move the UnsignedCoder out of mask_and_scale and apply it unconditionally.	{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }		1681353195