home / github

Menu
  • GraphQL API
  • Search all tables

issue_comments

Table actions
  • GraphQL API for issue_comments

2 rows where author_association = "NONE", issue = 152040420 and user = 6079398 sorted by updated_at descending

✎ View and edit SQL

This data as json, CSV (advanced)

Suggested facets: created_at (date), updated_at (date)

user 1

  • mogismog · 2 ✖

issue 1

  • MADIS netCDF to Pandas Dataframe: ValueError: iterator is too large · 2 ✖

author_association 1

  • NONE · 2 ✖
id html_url issue_url node_id user created_at updated_at ▲ author_association body reactions performed_via_github_app issue
216092859 https://github.com/pydata/xarray/issues/838#issuecomment-216092859 https://api.github.com/repos/pydata/xarray/issues/838 MDEyOklzc3VlQ29tbWVudDIxNjA5Mjg1OQ== mogismog 6079398 2016-05-02T02:07:47Z 2016-05-02T02:07:47Z NONE

Redeeming myself (only a little bit) from my previous message here:

@akrherz Was messing around with this a bit, this seems to work ok. This gets rid of unnecessary dimensions, concatenates string arrays, and turns it into a pandas DataFrame:

``` [In [1]: import xarray as xr

In [2]: ds = xr.open_dataset('20160430_1600.nc', decode_cf=True, mask_and_scale=False, decode_times=False) # xarray has issue decoding the times, so you'll have to do this in pandas.

In [3]: vars_to_drop = [k for k in ds.variables.iterkeys() if ('recNum' not in ds[k].dims)]

In [4]: ds = ds.drop(vars_to_drop)

In [5]: df = ds.to_dataframe()

In [6]: df.info() <class 'pandas.core.frame.DataFrame'> Int64Index: 6277 entries, 0 to 6276 Data columns (total 93 columns): invTime 6277 non-null int32 prevRecord 6277 non-null int32 isOverflow 6277 non-null int32 secondsStage1_2 6277 non-null int32 secondsStage3 6277 non-null int32 providerId 6277 non-null object stationId 6277 non-null object handbook5Id 6277 non-null object](url) ~snip~ ```

A bit hacky, but it works.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  MADIS netCDF to Pandas Dataframe: ValueError: iterator is too large 152040420
216090426 https://github.com/pydata/xarray/issues/838#issuecomment-216090426 https://api.github.com/repos/pydata/xarray/issues/838 MDEyOklzc3VlQ29tbWVudDIxNjA5MDQyNg== mogismog 6079398 2016-05-02T01:34:38Z 2016-05-02T01:34:38Z NONE

@shoyer: You're right in that MADIS netCDF files are (imo) poorly formatted. There is also the issue of xarray.decode_cf() not concatenating the string arrays even after fixing the _FillValue, missing_value conflict (hence requiring passing decode_cf=False when opening up the MADIS netCDF file). After looking at the decode_cf code, though, I don't think this is a bug per se (some quick debugging revealed that it doesn't seem like any variable in this netCDF file gets past this check), though if you feel this may in fact be a bug, I can look a bit more into it.

Unfortunately, this does mean I have to do a lot of "manual cleaning" of the netCDF file before exporting as a DataFrame, but it is easy to write a set of functions to accomplish this for you. That said, I can't c/p the exact code (for work-related reasons). I'm not sure how helpful this is, but when working with MADIS netCDF data, I more or less do the following as a workaround: 1. Open up the MADIS netCDF file, fix the _FillValue and missing_value conflict in the variables. 2. Drop the variables I don't want (and there is a lot of filler in MADIS netCDF files). 3. Concatenate the string arrays (e.g. stationId, dataProvider). 4. Turn into a pandas DataFrame.

Though reading over it, that is kind of a draw the owl-esque response, though. :/

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  MADIS netCDF to Pandas Dataframe: ValueError: iterator is too large 152040420

Advanced export

JSON shape: default, array, newline-delimited, object

CSV options:

CREATE TABLE [issue_comments] (
   [html_url] TEXT,
   [issue_url] TEXT,
   [id] INTEGER PRIMARY KEY,
   [node_id] TEXT,
   [user] INTEGER REFERENCES [users]([id]),
   [created_at] TEXT,
   [updated_at] TEXT,
   [author_association] TEXT,
   [body] TEXT,
   [reactions] TEXT,
   [performed_via_github_app] TEXT,
   [issue] INTEGER REFERENCES [issues]([id])
);
CREATE INDEX [idx_issue_comments_issue]
    ON [issue_comments] ([issue]);
CREATE INDEX [idx_issue_comments_user]
    ON [issue_comments] ([user]);
Powered by Datasette · Queries took 26.269ms · About: xarray-datasette