home / github

Menu
  • Search all tables
  • GraphQL API

issue_comments

Table actions
  • GraphQL API for issue_comments

7 rows where author_association = "NONE" and issue = 152040420 sorted by updated_at descending

✎ View and edit SQL

This data as json, CSV (advanced)

Suggested facets: created_at (date), updated_at (date)

user 4

  • akrherz 3
  • mogismog 2
  • stale[bot] 1
  • guytcc 1

issue 1

  • MADIS netCDF to Pandas Dataframe: ValueError: iterator is too large · 7 ✖

author_association 1

  • NONE · 7 ✖
id html_url issue_url node_id user created_at updated_at ▲ author_association body reactions performed_via_github_app issue
589823579 https://github.com/pydata/xarray/issues/838#issuecomment-589823579 https://api.github.com/repos/pydata/xarray/issues/838 MDEyOklzc3VlQ29tbWVudDU4OTgyMzU3OQ== akrherz 210858 2020-02-21T20:32:01Z 2020-02-21T20:32:01Z NONE

Just to denote that the issue still happens today with numpy=1.18.1, xarray=0.15.0, pandas=1.0.1 ```

df = nc.to_dataframe() Traceback (most recent call last): File "<stdin>", line 1, in <module> File "/opt/miniconda3/envs/prod/lib/python3.8/site-packages/xarray/core/dataset.py", line 4465, in to_dataframe return self._to_dataframe(self.dims) File "/opt/miniconda3/envs/prod/lib/python3.8/site-packages/xarray/core/dataset.py", line 4451, in _to_dataframe data = [ File "/opt/miniconda3/envs/prod/lib/python3.8/site-packages/xarray/core/dataset.py", line 4452, in <listcomp> self._variables[k].set_dims(ordered_dims).values.reshape(-1) File "/opt/miniconda3/envs/prod/lib/python3.8/site-packages/xarray/core/variable.py", line 1345, in set_dims expanded_data = duck_array_ops.broadcast_to(self.data, tmp_shape) File "/opt/miniconda3/envs/prod/lib/python3.8/site-packages/xarray/core/duck_array_ops.py", line 47, in f return wrapped(args, *kwargs) File "<array_function internals>", line 5, in broadcast_to File "/opt/miniconda3/envs/prod/lib/python3.8/site-packages/numpy/lib/stride_tricks.py", line 182, in broadcast_to return _broadcast_to(array, shape, subok=subok, readonly=True) File "/opt/miniconda3/envs/prod/lib/python3.8/site-packages/numpy/lib/stride_tricks.py", line 125, in _broadcast_to it = np.nditer( ValueError: iterator is too large ```

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  MADIS netCDF to Pandas Dataframe: ValueError: iterator is too large 152040420
589784874 https://github.com/pydata/xarray/issues/838#issuecomment-589784874 https://api.github.com/repos/pydata/xarray/issues/838 MDEyOklzc3VlQ29tbWVudDU4OTc4NDg3NA== stale[bot] 26384082 2020-02-21T18:50:16Z 2020-02-21T18:50:16Z NONE

In order to maintain a list of currently relevant issues, we mark issues as stale after a period of inactivity

If this issue remains relevant, please comment here or remove the stale label; otherwise it will be marked as closed automatically

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  MADIS netCDF to Pandas Dataframe: ValueError: iterator is too large 152040420
375755201 https://github.com/pydata/xarray/issues/838#issuecomment-375755201 https://api.github.com/repos/pydata/xarray/issues/838 MDEyOklzc3VlQ29tbWVudDM3NTc1NTIwMQ== guytcc 26440884 2018-03-23T18:12:26Z 2018-03-23T18:12:26Z NONE

Something maybe of interest.

I recently converted some tools we have to do the above from Python 2 to 3. When the files were read in the byte chars were not converted to strings. I couldn't actually get this to work on the xarray side and had to loop through the DataFrame columns with apply(.decode("utf-8")) to decode them properly. I'm assuming this might be in the NetCDF4 library, but not 100% sure.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  MADIS netCDF to Pandas Dataframe: ValueError: iterator is too large 152040420
216093339 https://github.com/pydata/xarray/issues/838#issuecomment-216093339 https://api.github.com/repos/pydata/xarray/issues/838 MDEyOklzc3VlQ29tbWVudDIxNjA5MzMzOQ== akrherz 210858 2016-05-02T02:15:22Z 2016-05-02T02:15:22Z NONE

@mogismog Awesome, thanks so much for the workaround :)

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  MADIS netCDF to Pandas Dataframe: ValueError: iterator is too large 152040420
216092859 https://github.com/pydata/xarray/issues/838#issuecomment-216092859 https://api.github.com/repos/pydata/xarray/issues/838 MDEyOklzc3VlQ29tbWVudDIxNjA5Mjg1OQ== mogismog 6079398 2016-05-02T02:07:47Z 2016-05-02T02:07:47Z NONE

Redeeming myself (only a little bit) from my previous message here:

@akrherz Was messing around with this a bit, this seems to work ok. This gets rid of unnecessary dimensions, concatenates string arrays, and turns it into a pandas DataFrame:

``` [In [1]: import xarray as xr

In [2]: ds = xr.open_dataset('20160430_1600.nc', decode_cf=True, mask_and_scale=False, decode_times=False) # xarray has issue decoding the times, so you'll have to do this in pandas.

In [3]: vars_to_drop = [k for k in ds.variables.iterkeys() if ('recNum' not in ds[k].dims)]

In [4]: ds = ds.drop(vars_to_drop)

In [5]: df = ds.to_dataframe()

In [6]: df.info() <class 'pandas.core.frame.DataFrame'> Int64Index: 6277 entries, 0 to 6276 Data columns (total 93 columns): invTime 6277 non-null int32 prevRecord 6277 non-null int32 isOverflow 6277 non-null int32 secondsStage1_2 6277 non-null int32 secondsStage3 6277 non-null int32 providerId 6277 non-null object stationId 6277 non-null object handbook5Id 6277 non-null object](url) ~snip~ ```

A bit hacky, but it works.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  MADIS netCDF to Pandas Dataframe: ValueError: iterator is too large 152040420
216090677 https://github.com/pydata/xarray/issues/838#issuecomment-216090677 https://api.github.com/repos/pydata/xarray/issues/838 MDEyOklzc3VlQ29tbWVudDIxNjA5MDY3Nw== akrherz 210858 2016-05-02T01:37:59Z 2016-05-02T01:37:59Z NONE

I thought of something, is the issue here with the unlimited record dimension?

netcdf \20160430_1600 { dimensions: .... recNum = UNLIMITED ; // (2845 currently)

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  MADIS netCDF to Pandas Dataframe: ValueError: iterator is too large 152040420
216090426 https://github.com/pydata/xarray/issues/838#issuecomment-216090426 https://api.github.com/repos/pydata/xarray/issues/838 MDEyOklzc3VlQ29tbWVudDIxNjA5MDQyNg== mogismog 6079398 2016-05-02T01:34:38Z 2016-05-02T01:34:38Z NONE

@shoyer: You're right in that MADIS netCDF files are (imo) poorly formatted. There is also the issue of xarray.decode_cf() not concatenating the string arrays even after fixing the _FillValue, missing_value conflict (hence requiring passing decode_cf=False when opening up the MADIS netCDF file). After looking at the decode_cf code, though, I don't think this is a bug per se (some quick debugging revealed that it doesn't seem like any variable in this netCDF file gets past this check), though if you feel this may in fact be a bug, I can look a bit more into it.

Unfortunately, this does mean I have to do a lot of "manual cleaning" of the netCDF file before exporting as a DataFrame, but it is easy to write a set of functions to accomplish this for you. That said, I can't c/p the exact code (for work-related reasons). I'm not sure how helpful this is, but when working with MADIS netCDF data, I more or less do the following as a workaround: 1. Open up the MADIS netCDF file, fix the _FillValue and missing_value conflict in the variables. 2. Drop the variables I don't want (and there is a lot of filler in MADIS netCDF files). 3. Concatenate the string arrays (e.g. stationId, dataProvider). 4. Turn into a pandas DataFrame.

Though reading over it, that is kind of a draw the owl-esque response, though. :/

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  MADIS netCDF to Pandas Dataframe: ValueError: iterator is too large 152040420

Advanced export

JSON shape: default, array, newline-delimited, object

CSV options:

CREATE TABLE [issue_comments] (
   [html_url] TEXT,
   [issue_url] TEXT,
   [id] INTEGER PRIMARY KEY,
   [node_id] TEXT,
   [user] INTEGER REFERENCES [users]([id]),
   [created_at] TEXT,
   [updated_at] TEXT,
   [author_association] TEXT,
   [body] TEXT,
   [reactions] TEXT,
   [performed_via_github_app] TEXT,
   [issue] INTEGER REFERENCES [issues]([id])
);
CREATE INDEX [idx_issue_comments_issue]
    ON [issue_comments] ([issue]);
CREATE INDEX [idx_issue_comments_user]
    ON [issue_comments] ([user]);
Powered by Datasette · Queries took 12.977ms · About: xarray-datasette