id,node_id,number,title,user,state,locked,assignee,milestone,comments,created_at,updated_at,closed_at,author_association,active_lock_reason,draft,pull_request,body,reactions,performed_via_github_app,state_reason,repo,type 161435547,MDExOlB1bGxSZXF1ZXN0NzQ2MTA1NjY=,889,Add concat_dimensions kwarg to decode_cf,6079398,closed,0,,,3,2016-06-21T13:23:45Z,2023-09-14T02:54:30Z,2023-09-14T02:54:05Z,NONE,,0,pydata/xarray/pulls/889,"Addressing #862 (and maybe others), this adds a new keyword argument `concat_dimensions` to `decode_cf`. This allows the user to explicitly specify which dimensions to concatenate over for string arrays. The dimensions that are concatenated over are left in the resulting Dataset. Also added tests! ","{""url"": ""https://api.github.com/repos/pydata/xarray/issues/889/reactions"", ""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,,13221727,pull 157545837,MDU6SXNzdWUxNTc1NDU4Mzc=,862,decode_cf not concatenating string arrays,6079398,closed,0,,,5,2016-05-30T19:05:49Z,2019-02-26T19:51:17Z,2019-02-26T19:51:17Z,NONE,,,,"**TL;DR**: `xarray.conventions.decode_cf()` doesn't seem to want to concatenate string arrays after opening up dataset with `decode_cf=False`. **OS**: Tried on both OS X 11.10 and 11.11 **xarray version**: 0.7.2 installed via conda **Python version**: 2.7.11 Hey all, I'm not sure if this is a bug or the intended behavior, but running `xarray.conventions.decode_cf` doesn't seem to concatenate 2D string arrays as promised given certain circumstances. Specifically, MADIS netCDF files have `_FillValue`/`missing_value` conflicts. When opening up the file in `xarray`, the exception gives this suggestion: `ValueError: ('Discovered conflicting _FillValue and missing_value. Considering opening the offending dataset using decode_cf=False, corrected the attributes', 'and decoding explicitly using xarray.conventions.decode_cf(ds)')` Doing this, though, doesn't result in 2D string arrays being concatenated: ``` In [50]: import xarray as xr In [51]: fname = '20160518_1200' In [52]: ds = xr.open_dataset(fname, decode_cf=False) In [53]: ds.stationId Out[53]: [756924 values with dtype=|S1] Coordinates: * maxStaIdLen (maxStaIdLen) int64 0 1 2 3 4 5 * recNum (recNum) int64 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 ... Attributes: long_name: alphanumeric station Id reference: station table In [54]: for _, v in ds.variables.iteritems(): _fix_fillval_conflict(v) # You can find this function in the linked gist ....: In [55]: decoded_ds = xr.conventions.decode_cf(ds, concat_characters=True) In [56]: decoded_ds.stationId Out[56]: [756924 values with dtype=|S1] Coordinates: * maxStaIdLen (maxStaIdLen) int64 0 1 2 3 4 5 * recNum (recNum) int64 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 ... Attributes: long_name: alphanumeric station Id reference: station table ``` That said, if you pass `decode_cf=True` and option to not do things like `mask_and_scale` and `decode_times` (due to the aforementioned conflict), the string arrays get concatenated: ``` In [57]: ds = xr.open_dataset(fname, decode_cf=True, mask_and_scale=False, decode_times=False) In [58]: ds.stationId Out[58]: [126154 values with dtype=|S6] Coordinates: * recNum (recNum) int64 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 ... Attributes: long_name: alphanumeric station Id reference: station table ``` We then can fix the conflict and run `decode_cf` without issue. This is an acceptable (albeit not immediately obvious or intuitive) workaround, of course, but I'm not sure if this behavior is known or intended. I'll fully admit that it may be an issue with the specific netCDF data I'm working with (NOAA MADIS data, which is kind of a train wreck w/r/t CF convention to begin with), but I don't have any other datasets with which to test. To that extent, I've [coded up tests](https://gist.github.com/mogismog/4ff94dd3d67afd612520bc0755f614ca) and [uploaded a gzipped MADIS netCDF file to DropBox](https://www.dropbox.com/s/gvisjl2skpf4zzr/20160518_1200.gz?dl=1) if you're interested in reproducing this behavior. Thanks! ","{""url"": ""https://api.github.com/repos/pydata/xarray/issues/862/reactions"", ""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,completed,13221727,issue