html_url,issue_url,id,node_id,user,created_at,updated_at,author_association,body,reactions,performed_via_github_app,issue
https://github.com/pydata/xarray/issues/838#issuecomment-216092859,https://api.github.com/repos/pydata/xarray/issues/838,216092859,MDEyOklzc3VlQ29tbWVudDIxNjA5Mjg1OQ==,6079398,2016-05-02T02:07:47Z,2016-05-02T02:07:47Z,NONE,"Redeeming myself (only a little bit) from my previous message here:
@akrherz Was messing around with this a bit, this seems to work ok. This gets rid of unnecessary dimensions, concatenates string arrays, and turns it into a pandas DataFrame:
```
[In [1]: import xarray as xr
In [2]: ds = xr.open_dataset('20160430_1600.nc', decode_cf=True, mask_and_scale=False, decode_times=False) # xarray has issue decoding the times, so you'll have to do this in pandas.
In [3]: vars_to_drop = [k for k in ds.variables.iterkeys() if ('recNum' not in ds[k].dims)]
In [4]: ds = ds.drop(vars_to_drop)
In [5]: df = ds.to_dataframe()
In [6]: df.info()
Int64Index: 6277 entries, 0 to 6276
Data columns (total 93 columns):
invTime 6277 non-null int32
prevRecord 6277 non-null int32
isOverflow 6277 non-null int32
secondsStage1_2 6277 non-null int32
secondsStage3 6277 non-null int32
providerId 6277 non-null object
stationId 6277 non-null object
handbook5Id 6277 non-null object](url)
~snip~
```
A bit hacky, but it works.
","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,152040420
https://github.com/pydata/xarray/issues/838#issuecomment-216090426,https://api.github.com/repos/pydata/xarray/issues/838,216090426,MDEyOklzc3VlQ29tbWVudDIxNjA5MDQyNg==,6079398,2016-05-02T01:34:38Z,2016-05-02T01:34:38Z,NONE,"@shoyer: You're right in that MADIS netCDF files are (imo) poorly formatted. There is also the issue of `xarray.decode_cf()` not concatenating the string arrays even after fixing the `_FillValue`, `missing_value` conflict (hence requiring passing `decode_cf=False` when opening up the MADIS netCDF file). After looking at the `decode_cf` code, though, I don't think this is a bug per se (some quick debugging revealed that it doesn't seem like any variable in this netCDF file gets [past this check](https://github.com/pydata/xarray/blob/master/xarray/conventions.py#L802)), though if you feel this may in fact be a bug, I can look a bit more into it.
Unfortunately, this does mean I have to do a lot of ""manual cleaning"" of the netCDF file before exporting as a DataFrame, but it is easy to write a set of functions to accomplish this for you. That said, I can't c/p the exact code (for work-related reasons). I'm not sure how helpful this is, but when working with MADIS netCDF data, I more or less do the following as a workaround:
1. Open up the MADIS netCDF file, fix the `_FillValue` and `missing_value` conflict in the variables.
2. Drop the variables I don't want (and there is _a lot_ of filler in MADIS netCDF files).
3. Concatenate the string arrays (e.g. `stationId`, `dataProvider`).
4. Turn into a pandas DataFrame.
Though reading over it, that is kind of a [draw the owl](http://knowyourmeme.com/memes/how-to-draw-an-owl)-esque response, though. :/
","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,152040420