home / github / issues

Menu
  • Search all tables
  • GraphQL API

issues: 157545837

This data as json

id node_id number title user state locked assignee milestone comments created_at updated_at closed_at author_association active_lock_reason draft pull_request body reactions performed_via_github_app state_reason repo type
157545837 MDU6SXNzdWUxNTc1NDU4Mzc= 862 decode_cf not concatenating string arrays 6079398 closed 0     5 2016-05-30T19:05:49Z 2019-02-26T19:51:17Z 2019-02-26T19:51:17Z NONE      

TL;DR: xarray.conventions.decode_cf() doesn't seem to want to concatenate string arrays after opening up dataset with decode_cf=False.

OS: Tried on both OS X 11.10 and 11.11 xarray version: 0.7.2 installed via conda Python version: 2.7.11

Hey all,

I'm not sure if this is a bug or the intended behavior, but running xarray.conventions.decode_cf doesn't seem to concatenate 2D string arrays as promised given certain circumstances.

Specifically, MADIS netCDF files have _FillValue/missing_value conflicts. When opening up the file in xarray, the exception gives this suggestion:

ValueError: ('Discovered conflicting _FillValue and missing_value. Considering opening the offending dataset using decode_cf=False, corrected the attributes', 'and decoding explicitly using xarray.conventions.decode_cf(ds)')

Doing this, though, doesn't result in 2D string arrays being concatenated:

``` In [50]: import xarray as xr

In [51]: fname = '20160518_1200'

In [52]: ds = xr.open_dataset(fname, decode_cf=False)

In [53]: ds.stationId Out[53]: <xarray.DataArray 'stationId' (recNum: 126154, maxStaIdLen: 6)> [756924 values with dtype=|S1] Coordinates: * maxStaIdLen (maxStaIdLen) int64 0 1 2 3 4 5 * recNum (recNum) int64 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 ... Attributes: long_name: alphanumeric station Id reference: station table

In [54]: for _, v in ds.variables.iteritems(): _fix_fillval_conflict(v) # You can find this function in the linked gist ....:

In [55]: decoded_ds = xr.conventions.decode_cf(ds, concat_characters=True)

In [56]: decoded_ds.stationId Out[56]: <xarray.DataArray 'stationId' (recNum: 126154, maxStaIdLen: 6)> [756924 values with dtype=|S1] Coordinates: * maxStaIdLen (maxStaIdLen) int64 0 1 2 3 4 5 * recNum (recNum) int64 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 ... Attributes: long_name: alphanumeric station Id reference: station table ```

That said, if you pass decode_cf=True and option to not do things like mask_and_scale and decode_times (due to the aforementioned conflict), the string arrays get concatenated:

``` In [57]: ds = xr.open_dataset(fname, decode_cf=True, mask_and_scale=False, decode_times=False)

In [58]: ds.stationId Out[58]: <xarray.DataArray 'stationId' (recNum: 126154)> [126154 values with dtype=|S6] Coordinates: * recNum (recNum) int64 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 ... Attributes: long_name: alphanumeric station Id reference: station table ```

We then can fix the conflict and run decode_cf without issue. This is an acceptable (albeit not immediately obvious or intuitive) workaround, of course, but I'm not sure if this behavior is known or intended. I'll fully admit that it may be an issue with the specific netCDF data I'm working with (NOAA MADIS data, which is kind of a train wreck w/r/t CF convention to begin with), but I don't have any other datasets with which to test.

To that extent, I've coded up tests and uploaded a gzipped MADIS netCDF file to DropBox if you're interested in reproducing this behavior.

Thanks!

{
    "url": "https://api.github.com/repos/pydata/xarray/issues/862/reactions",
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  completed 13221727 issue

Links from other tables

  • 1 row from issues_id in issues_labels
  • 5 rows from issue in issue_comments
Powered by Datasette · Queries took 0.901ms · About: xarray-datasette