home / github

Menu
  • GraphQL API
  • Search all tables

issue_comments

Table actions
  • GraphQL API for issue_comments

3 rows where issue = 202964277 and user = 1217238 sorted by updated_at descending

✖
✖
✖

✎ View and edit SQL

This data as json, CSV (advanced)

Suggested facets: created_at (date), updated_at (date)

These facets timed out: author_association

user 1

  • shoyer · 3 ✖

issue 1

  • “ValueError: chunksize cannot exceed dimension size” when trying to write xarray to netcdf · 3 ✖
id html_url issue_url node_id user created_at updated_at ▲ author_association body reactions performed_via_github_app issue
343335659 https://github.com/pydata/xarray/issues/1225#issuecomment-343335659 https://api.github.com/repos/pydata/xarray/issues/1225 MDEyOklzc3VlQ29tbWVudDM0MzMzNTY1OQ== shoyer 1217238 2017-11-10T00:23:32Z 2017-11-10T00:23:32Z MEMBER

Doing some digging, it turns out this turned up quite a while ago back in #156 where we added some code to fix this.

Looking at @tbohn's dataset, the problem variable is actually the coordinate variable 'time' corresponding to the unlimited dimension: ``` In [7]: ds.variables['time'] Out[7]: <class 'netCDF4._netCDF4.Variable'> int32 time(time) units: days since 2000-01-01 00:00:00.0 unlimited dimensions: time current shape = (5,) filling on, default _FillValue of -2147483647 used

In [8]: ds.variables['time'].chunking() Out[8]: [1048576]

In [9]: 2 ** 20 Out[9]: 1048576

In [10]: ds.dimensions Out[10]: OrderedDict([('veg_class', <class 'netCDF4._netCDF4.Dimension'>: name = 'veg_class', size = 19), ('lat', <class 'netCDF4._netCDF4.Dimension'>: name = 'lat', size = 160), ('lon', <class 'netCDF4._netCDF4.Dimension'>: name = 'lon', size = 160), ('time', <class 'netCDF4._netCDF4.Dimension'> (unlimited): name = 'time', size = 5)]) ```

For some reason netCDF4 gives it a chunking of 2 ** 20, even though it only has length 5. This leads to an error when we write a file back with the original chunking.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  “ValueError: chunksize cannot exceed dimension size” when trying to write xarray to netcdf 202964277
343332081 https://github.com/pydata/xarray/issues/1225#issuecomment-343332081 https://api.github.com/repos/pydata/xarray/issues/1225 MDEyOklzc3VlQ29tbWVudDM0MzMzMjA4MQ== shoyer 1217238 2017-11-10T00:02:07Z 2017-11-10T00:02:07Z MEMBER

@chrwerner Sorry to hear about your trouble, I will take another look at this.

Right now, your best bet is probably something like: python def clean_dataset(ds): for var in ds.variables.values(): if 'chunksizes' in var.encoding: del var.encoding['chunksizes']

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  “ValueError: chunksize cannot exceed dimension size” when trying to write xarray to netcdf 202964277
307519054 https://github.com/pydata/xarray/issues/1225#issuecomment-307519054 https://api.github.com/repos/pydata/xarray/issues/1225 MDEyOklzc3VlQ29tbWVudDMwNzUxOTA1NA== shoyer 1217238 2017-06-09T23:02:20Z 2017-06-09T23:02:20Z MEMBER

@tbohn "self-contained" just means something that I can run on my machine. For example, the code above plus the "somefile.nc" netCDF file that I can load to reproduce this example.

Thinking about this a little more, I think the issue is somehow related to the encoding['chunksizes'] property on the Dataset variables loaded from the original netCDF file. Something like this should work as a work-around: del myds.var.encoding['chunksizes']

The bug is somewhere in our handling of chunksize encoding for netCDF4, but it is difficult to fix it without being able to run code that reproduces it.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  “ValueError: chunksize cannot exceed dimension size” when trying to write xarray to netcdf 202964277

Advanced export

JSON shape: default, array, newline-delimited, object

CSV options:

CREATE TABLE [issue_comments] (
   [html_url] TEXT,
   [issue_url] TEXT,
   [id] INTEGER PRIMARY KEY,
   [node_id] TEXT,
   [user] INTEGER REFERENCES [users]([id]),
   [created_at] TEXT,
   [updated_at] TEXT,
   [author_association] TEXT,
   [body] TEXT,
   [reactions] TEXT,
   [performed_via_github_app] TEXT,
   [issue] INTEGER REFERENCES [issues]([id])
);
CREATE INDEX [idx_issue_comments_issue]
    ON [issue_comments] ([issue]);
CREATE INDEX [idx_issue_comments_user]
    ON [issue_comments] ([user]);
Powered by Datasette · Queries took 1409.345ms · About: xarray-datasette
  • Sort ascending
  • Sort descending
  • Facet by this
  • Hide this column
  • Show all columns
  • Show not-blank rows