home / github

Menu
  • Search all tables
  • GraphQL API

issue_comments

Table actions
  • GraphQL API for issue_comments

5 rows where user = 6079398 sorted by updated_at descending

✎ View and edit SQL

This data as json, CSV (advanced)

Suggested facets: issue_url, created_at (date), updated_at (date)

issue 3

  • MADIS netCDF to Pandas Dataframe: ValueError: iterator is too large 2
  • decode_cf not concatenating string arrays 2
  • Add concat_dimensions kwarg to decode_cf 1

user 1

  • mogismog · 5 ✖

author_association 1

  • NONE 5
id html_url issue_url node_id user created_at updated_at ▲ author_association body reactions performed_via_github_app issue
315783849 https://github.com/pydata/xarray/pull/889#issuecomment-315783849 https://api.github.com/repos/pydata/xarray/issues/889 MDEyOklzc3VlQ29tbWVudDMxNTc4Mzg0OQ== mogismog 6079398 2017-07-17T15:09:11Z 2017-07-17T15:09:11Z NONE

@jhamman Sorry for the delayed response (and the even more delayed PR)! I'd love to finish this up, apologies for having this fall to the wayside.

Lemme look at it a bit after work and see how much work it would take to resolve the merge conflicts, though it seems like the issue is only in the test_conventions.py file, so it may not be that difficult to resolve.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Add concat_dimensions kwarg to decode_cf 161435547
224949305 https://github.com/pydata/xarray/issues/862#issuecomment-224949305 https://api.github.com/repos/pydata/xarray/issues/862 MDEyOklzc3VlQ29tbWVudDIyNDk0OTMwNQ== mogismog 6079398 2016-06-09T16:26:36Z 2016-06-09T16:26:36Z NONE

This seems a little too magical to me. How would we know if the dataset dimension was added intentionally or not?

Yeah, that's a fair point. I'll put together something that uses an optional list of dimensions to concatenate over. Thanks!

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  decode_cf not concatenating string arrays 157545837
224787831 https://github.com/pydata/xarray/issues/862#issuecomment-224787831 https://api.github.com/repos/pydata/xarray/issues/862 MDEyOklzc3VlQ29tbWVudDIyNDc4NzgzMQ== mogismog 6079398 2016-06-09T02:51:11Z 2016-06-09T02:51:52Z NONE

Hey @shoyer,

Sorry for the delayed response. Passing a list of dimensions over which to concatenate over seems like it would be the easiest workaround with the fewest questions asked. As you mentioned, every dimension gets a variable by the time it is a dataset, so another option (that I'll admit I haven't thought all the way through and may not even work) would be to first check if decode_cf is working on a Dataset or AbstractDataStore (e: which it already does anyway), and then decide whether to concatenate over a dimension or not. I could see the latter idea not working out so well, but I'd be curious about your thoughts.

Either way, I can put something together this week and open up a PR.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  decode_cf not concatenating string arrays 157545837
216092859 https://github.com/pydata/xarray/issues/838#issuecomment-216092859 https://api.github.com/repos/pydata/xarray/issues/838 MDEyOklzc3VlQ29tbWVudDIxNjA5Mjg1OQ== mogismog 6079398 2016-05-02T02:07:47Z 2016-05-02T02:07:47Z NONE

Redeeming myself (only a little bit) from my previous message here:

@akrherz Was messing around with this a bit, this seems to work ok. This gets rid of unnecessary dimensions, concatenates string arrays, and turns it into a pandas DataFrame:

``` [In [1]: import xarray as xr

In [2]: ds = xr.open_dataset('20160430_1600.nc', decode_cf=True, mask_and_scale=False, decode_times=False) # xarray has issue decoding the times, so you'll have to do this in pandas.

In [3]: vars_to_drop = [k for k in ds.variables.iterkeys() if ('recNum' not in ds[k].dims)]

In [4]: ds = ds.drop(vars_to_drop)

In [5]: df = ds.to_dataframe()

In [6]: df.info() <class 'pandas.core.frame.DataFrame'> Int64Index: 6277 entries, 0 to 6276 Data columns (total 93 columns): invTime 6277 non-null int32 prevRecord 6277 non-null int32 isOverflow 6277 non-null int32 secondsStage1_2 6277 non-null int32 secondsStage3 6277 non-null int32 providerId 6277 non-null object stationId 6277 non-null object handbook5Id 6277 non-null object](url) ~snip~ ```

A bit hacky, but it works.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  MADIS netCDF to Pandas Dataframe: ValueError: iterator is too large 152040420
216090426 https://github.com/pydata/xarray/issues/838#issuecomment-216090426 https://api.github.com/repos/pydata/xarray/issues/838 MDEyOklzc3VlQ29tbWVudDIxNjA5MDQyNg== mogismog 6079398 2016-05-02T01:34:38Z 2016-05-02T01:34:38Z NONE

@shoyer: You're right in that MADIS netCDF files are (imo) poorly formatted. There is also the issue of xarray.decode_cf() not concatenating the string arrays even after fixing the _FillValue, missing_value conflict (hence requiring passing decode_cf=False when opening up the MADIS netCDF file). After looking at the decode_cf code, though, I don't think this is a bug per se (some quick debugging revealed that it doesn't seem like any variable in this netCDF file gets past this check), though if you feel this may in fact be a bug, I can look a bit more into it.

Unfortunately, this does mean I have to do a lot of "manual cleaning" of the netCDF file before exporting as a DataFrame, but it is easy to write a set of functions to accomplish this for you. That said, I can't c/p the exact code (for work-related reasons). I'm not sure how helpful this is, but when working with MADIS netCDF data, I more or less do the following as a workaround: 1. Open up the MADIS netCDF file, fix the _FillValue and missing_value conflict in the variables. 2. Drop the variables I don't want (and there is a lot of filler in MADIS netCDF files). 3. Concatenate the string arrays (e.g. stationId, dataProvider). 4. Turn into a pandas DataFrame.

Though reading over it, that is kind of a draw the owl-esque response, though. :/

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  MADIS netCDF to Pandas Dataframe: ValueError: iterator is too large 152040420

Advanced export

JSON shape: default, array, newline-delimited, object

CSV options:

CREATE TABLE [issue_comments] (
   [html_url] TEXT,
   [issue_url] TEXT,
   [id] INTEGER PRIMARY KEY,
   [node_id] TEXT,
   [user] INTEGER REFERENCES [users]([id]),
   [created_at] TEXT,
   [updated_at] TEXT,
   [author_association] TEXT,
   [body] TEXT,
   [reactions] TEXT,
   [performed_via_github_app] TEXT,
   [issue] INTEGER REFERENCES [issues]([id])
);
CREATE INDEX [idx_issue_comments_issue]
    ON [issue_comments] ([issue]);
CREATE INDEX [idx_issue_comments_user]
    ON [issue_comments] ([user]);
Powered by Datasette · Queries took 1678.163ms · About: xarray-datasette