home / github

Menu
  • Search all tables
  • GraphQL API

issues

Table actions
  • GraphQL API for issues

2 rows where user = 5793360 sorted by updated_at descending

✎ View and edit SQL

This data as json, CSV (advanced)

Suggested facets: created_at (date), updated_at (date)

type 1

  • issue 2

state 1

  • open 2

repo 1

  • xarray 2
id node_id number title user state locked assignee milestone comments created_at updated_at ▲ closed_at author_association active_lock_reason draft pull_request body reactions performed_via_github_app state_reason repo type
406612733 MDU6SXNzdWU0MDY2MTI3MzM= 2742 handle default fill value karl-malakoff 5793360 open 0     3 2019-02-05T03:10:07Z 2022-10-26T13:52:50Z   NONE      

If a variable does not define a _FillValue value the 'default fill value' is normally used where data is masked. The default netCDF4 library does this by default and can be controlled with the set_auto_mask() function.

For example loading a NetCDF with no explicit fill value set:

```python In [92]: from netCDF4 import Dataset

In [93]: osd = Dataset('os150nb.nc', 'r')

In [94]: osd['u'] Out[94]: <class 'netCDF4._netCDF4.Variable'> float32 u(time, depth_cell) missing_value: 1e+38 long_name: Zonal velocity component units: meter second-1 C_format: %7.2f data_min: -0.6097069 data_max: 0.6496426 unlimited dimensions: current shape = (6830, 60) filling on, default _FillValue of 9.969209968386869e+36 used

In [95]: u[1000] Out[95]: masked_array(data=[0.09373848885297775, 0.08173848688602448, 0.0697384923696518, 0.12273849546909332, 0.11573849618434906, 0.1387384980916977, 0.17173849046230316, 0.17673850059509277, 0.17673850059509277, 0.16373848915100098, 0.1857384890317917, 0.17673850059509277, 0.20173849165439606, 0.20973849296569824, 0.2037384957075119, 0.2297385036945343, 0.23273849487304688, 0.22873848676681519, 0.24073849618434906, 0.22873848676681519, 0.23073849081993103, 0.23273849487304688, 0.24973849952220917, 0.2467384934425354, 0.2207385003566742, 0.22773849964141846, 0.2387385070323944, 0.21473848819732666, 0.23973849415779114, 0.23673850297927856, 0.2517384886741638, 0.25273850560188293, 0.21973849833011627, 0.2387385070323944, 0.2207385003566742, 0.22373849153518677, 0.23473849892616272, 0.21073849499225616, 0.2247384935617447, --, --, --, --, --, --, --, --, --, --, --, --, --, --, --, --, --, --, --, --, --], mask=[False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, True, True, True, True, True, True, True, True, True, True, True, True, True, True, True, True, True, True, True, True, True], fill_value=9.96921e+36, dtype=float32) ```

The resulting array is a masked array where missing values are masked. You can see the default fill value has been given in the variable output.

When loading the same NetCDF with xarray that fill value gets used where the values would be masked by NetCDF4.

``` python In [107]: os150 = xr.open_dataset('os150nb.nc', decode_cf=True, mask_and_scale=True, decode_coords=True)

In [108]: os150.u[1000] Out[108]: <xarray.DataArray 'u' (depth_cell: 60)> array([9.373849e-02, 8.173849e-02, 6.973849e-02, 1.227385e-01, 1.157385e-01, 1.387385e-01, 1.717385e-01, 1.767385e-01, 1.767385e-01, 1.637385e-01, 1.857385e-01, 1.767385e-01, 2.017385e-01, 2.097385e-01, 2.037385e-01, 2.297385e-01, 2.327385e-01, 2.287385e-01, 2.407385e-01, 2.287385e-01, 2.307385e-01, 2.327385e-01, 2.497385e-01, 2.467385e-01, 2.207385e-01, 2.277385e-01, 2.387385e-01, 2.147385e-01, 2.397385e-01, 2.367385e-01, 2.517385e-01, 2.527385e-01, 2.197385e-01, 2.387385e-01, 2.207385e-01, 2.237385e-01, 2.347385e-01, 2.107385e-01, 2.247385e-01, 9.969210e+36, 9.969210e+36, 9.969210e+36, 9.969210e+36, 9.969210e+36, 9.969210e+36, 9.969210e+36, 9.969210e+36, 9.969210e+36, 9.969210e+36, 9.969210e+36, 9.969210e+36, 9.969210e+36, 9.969210e+36, 9.969210e+36, 9.969210e+36, 9.969210e+36, 9.969210e+36, 9.969210e+36, 9.969210e+36, 9.969210e+36]) Coordinates: time datetime64[ns] 2018-11-26T10:24:53.971200 Dimensions without coordinates: depth_cell Attributes: long_name: Zonal velocity component units: meter second-1 C_format: %7.2f data_min: -0.6097069 data_max: 0.6496426 ```

While this behaviour is correct in the sense that xarray has followed the NetCDF specification it's now no longer clear that those values were missing in the original NetCDF.

The attributes don't mention the fill value so even though this is outside the specified data range one could be forgiven for thinking that's the actual value in the DataArray. It's especially confusing when you've asked to have CF decoded and these values are still present.

Further more if you look at the encoding for this DataArray you can see that it incorrectly states that the _FillVaule is the missing_value:

python In [136]: os150['u'].encoding Out[136]: {'source': 'C:\\Data\\adcp_processing\\in2018_v06\\postproc\\os150nb\\contour\\os150nb.nc', 'original_shape': (6830, 60), '_FillValue': 1e+38, 'dtype': dtype('float32')}

Unless I'm missing something I think this behaviour should be changed to either: * Explicitly mention that the default fill value is being used in the DataArray attributes or have some other way of identifying it or * Mask this value with nan/missing_vlaue in the resulting DataArray

Note that the NetCDF file I've used here isn't publicly available yet but I can add a link to it soon once it is.

{
    "url": "https://api.github.com/repos/pydata/xarray/issues/2742/reactions",
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
    xarray 13221727 issue
433703713 MDU6SXNzdWU0MzM3MDM3MTM= 2899 Explicit fixed width 'S1' arrays re-encoded creating extra dimension karl-malakoff 5793360 open 0     2 2019-04-16T10:27:57Z 2019-04-17T16:15:29Z   NONE      

```python from collections import OrderedDict import xarray as xr import numpy as np

sensor_string_np = np.zeros([12, 100], dtype='|S1') data_vars = {} data_vars['sensorName'] = xr.DataArray(data=sensor_string_np.copy(), attrs=OrderedDict([('_FillValue', ' '),]), name="sensorName", dims=("sensor", "string"))

scanfile = xr.Dataset(data_vars=data_vars) scanfile.sensorName[0, :len("test")] = np.frombuffer("test".encode(), dtype='|S1') scanfile.to_netcdf('test.nc') ```

(py37) C:\Data\in2019_v02>ncdump -h test.nc netcdf test { dimensions: sensor = 12 ; string = 100 ; string1 = 1 ; variables: char sensorName(sensor, string, string1) ; sensorName:_FillValue = " " ; }

Problem description

I'm not entirely sure if this is a bug or user error. The above code is a minimal example of an issue we've been having in the latest version of xarray (or since about version 11).

We are trying to preserve the old fixed width char array style of strings for backwards compatibility purposes. However the above code adds in the extra 'string1' dimension when saving to NetCDF.

From what I can understand this is a feature of encoding described at http://xarray.pydata.org/en/stable/io.html#string-encoding. I think xarray is treating each byte of the S1 array as a 'string' which it is then encoding again by splitting each character into byte arrays of one byte each.

What I would expect is that since I'm explicitly working with a char arrays rather than strings is for the array to be written to the disk as is.

I can work around this by setting the encoding for the variable to be 'str' and removing the _FillValue:

python data_vars['sensorName'] = xr.DataArray(data=sensor_string_np.copy(), name="sensorName", dims=("sensor", "string")) ... scanfile.to_netcdf(r'test.nc', encoding={'sensorName': {'dtype': 'str'}})

(py37) C:\Data\in2019_v02>ncdump -h test.nc netcdf test { dimensions: sensor = 12 ; string = 100 ; variables: string sensorName(sensor, string) ; }

However this seems like a painful work around.

Is there another way I should be doing this?

Output of xr.show_versions()

INSTALLED VERSIONS ------------------ commit: None python: 3.7.3 | packaged by conda-forge | (default, Mar 27 2019, 23:18:50) [MSC v.1900 64 bit (AMD64)] python-bits: 64 OS: Windows OS-release: 10 machine: AMD64 processor: Intel64 Family 6 Model 94 Stepping 3, GenuineIntel byteorder: little LC_ALL: None LANG: None LOCALE: None.None libhdf5: 1.10.4 libnetcdf: 4.6.2 xarray: 0.12.1 pandas: 0.24.2 numpy: 1.16.2 scipy: None netCDF4: 1.5.0.1 pydap: None h5netcdf: None h5py: None Nio: None zarr: None cftime: 1.0.3.4 nc_time_axis: None PseudonetCDF: None rasterio: None cfgrib: None iris: None bottleneck: None dask: None distributed: None matplotlib: None cartopy: None seaborn: None setuptools: 41.0.0 pip: 19.0.3 conda: None pytest: None IPython: None sphinx: None
{
    "url": "https://api.github.com/repos/pydata/xarray/issues/2899/reactions",
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
    xarray 13221727 issue

Advanced export

JSON shape: default, array, newline-delimited, object

CSV options:

CREATE TABLE [issues] (
   [id] INTEGER PRIMARY KEY,
   [node_id] TEXT,
   [number] INTEGER,
   [title] TEXT,
   [user] INTEGER REFERENCES [users]([id]),
   [state] TEXT,
   [locked] INTEGER,
   [assignee] INTEGER REFERENCES [users]([id]),
   [milestone] INTEGER REFERENCES [milestones]([id]),
   [comments] INTEGER,
   [created_at] TEXT,
   [updated_at] TEXT,
   [closed_at] TEXT,
   [author_association] TEXT,
   [active_lock_reason] TEXT,
   [draft] INTEGER,
   [pull_request] TEXT,
   [body] TEXT,
   [reactions] TEXT,
   [performed_via_github_app] TEXT,
   [state_reason] TEXT,
   [repo] INTEGER REFERENCES [repos]([id]),
   [type] TEXT
);
CREATE INDEX [idx_issues_repo]
    ON [issues] ([repo]);
CREATE INDEX [idx_issues_milestone]
    ON [issues] ([milestone]);
CREATE INDEX [idx_issues_assignee]
    ON [issues] ([assignee]);
CREATE INDEX [idx_issues_user]
    ON [issues] ([user]);
Powered by Datasette · Queries took 21.698ms · About: xarray-datasette