home / github

Menu
  • Search all tables
  • GraphQL API

issue_comments

Table actions
  • GraphQL API for issue_comments

4 rows where author_association = "MEMBER" and issue = 202964277 sorted by updated_at descending

✖
✖
✖

✎ View and edit SQL

This data as json, CSV (advanced)

Suggested facets: created_at (date), updated_at (date)

user 2

  • shoyer 3
  • jhamman 1

issue 1

  • “ValueError: chunksize cannot exceed dimension size” when trying to write xarray to netcdf · 4 ✖

author_association 1

  • MEMBER · 4 ✖
id html_url issue_url node_id user created_at updated_at ▲ author_association body reactions performed_via_github_app issue
343335659 https://github.com/pydata/xarray/issues/1225#issuecomment-343335659 https://api.github.com/repos/pydata/xarray/issues/1225 MDEyOklzc3VlQ29tbWVudDM0MzMzNTY1OQ== shoyer 1217238 2017-11-10T00:23:32Z 2017-11-10T00:23:32Z MEMBER

Doing some digging, it turns out this turned up quite a while ago back in #156 where we added some code to fix this.

Looking at @tbohn's dataset, the problem variable is actually the coordinate variable 'time' corresponding to the unlimited dimension: ``` In [7]: ds.variables['time'] Out[7]: <class 'netCDF4._netCDF4.Variable'> int32 time(time) units: days since 2000-01-01 00:00:00.0 unlimited dimensions: time current shape = (5,) filling on, default _FillValue of -2147483647 used

In [8]: ds.variables['time'].chunking() Out[8]: [1048576]

In [9]: 2 ** 20 Out[9]: 1048576

In [10]: ds.dimensions Out[10]: OrderedDict([('veg_class', <class 'netCDF4._netCDF4.Dimension'>: name = 'veg_class', size = 19), ('lat', <class 'netCDF4._netCDF4.Dimension'>: name = 'lat', size = 160), ('lon', <class 'netCDF4._netCDF4.Dimension'>: name = 'lon', size = 160), ('time', <class 'netCDF4._netCDF4.Dimension'> (unlimited): name = 'time', size = 5)]) ```

For some reason netCDF4 gives it a chunking of 2 ** 20, even though it only has length 5. This leads to an error when we write a file back with the original chunking.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  “ValueError: chunksize cannot exceed dimension size” when trying to write xarray to netcdf 202964277
343332081 https://github.com/pydata/xarray/issues/1225#issuecomment-343332081 https://api.github.com/repos/pydata/xarray/issues/1225 MDEyOklzc3VlQ29tbWVudDM0MzMzMjA4MQ== shoyer 1217238 2017-11-10T00:02:07Z 2017-11-10T00:02:07Z MEMBER

@chrwerner Sorry to hear about your trouble, I will take another look at this.

Right now, your best bet is probably something like: python def clean_dataset(ds): for var in ds.variables.values(): if 'chunksizes' in var.encoding: del var.encoding['chunksizes']

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  “ValueError: chunksize cannot exceed dimension size” when trying to write xarray to netcdf 202964277
326138431 https://github.com/pydata/xarray/issues/1225#issuecomment-326138431 https://api.github.com/repos/pydata/xarray/issues/1225 MDEyOklzc3VlQ29tbWVudDMyNjEzODQzMQ== jhamman 2443309 2017-08-30T22:36:14Z 2017-08-30T22:36:14Z MEMBER

@tbohn - What is happening here is that xarray is storing the netCDF4 chunk size from the input file. For the LAI variable in your example, that isLAI:_ChunkSizes = 19, 1, 160, 160 ; (you can see this with ncdump -h -s filename.nc).

shell $ ncdump -s -h veg_hist.0_10n.90_80w.2000_2016.mode_PFT.5dates.nc netcdf veg_hist.0_10n.90_80w.2000_2016.mode_PFT.5dates { dimensions: veg_class = 19 ; lat = 160 ; lon = 160 ; time = UNLIMITED ; // (5 currently) variables: float Cv(veg_class, lat, lon) ; Cv:_FillValue = -1.f ; Cv:units = "-" ; Cv:longname = "Area Fraction" ; Cv:missing_value = -1.f ; Cv:_Storage = "contiguous" ; Cv:_Endianness = "little" ; float LAI(veg_class, time, lat, lon) ; LAI:_FillValue = -1.f ; LAI:units = "m2/m2" ; LAI:longname = "Leaf Area Index" ; LAI:missing_value = -1.f ; LAI:_Storage = "chunked" ; LAI:_ChunkSizes = 19, 1, 160, 160 ; LAI:_Endianness = "little" ; ...

Those integers correspond to the dimensions from LAI. When you slice your dataset, you end up with lat/lon dimensions that are now smaller than the _ChunkSizes. When writing this back to netCDF, xarray is still trying to use the original encoding attribute.

The logical fix is to validate this encoding attribute and either 1) throw an informative error if something isn't going to work, or 2) change the ChunkSizes.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  “ValueError: chunksize cannot exceed dimension size” when trying to write xarray to netcdf 202964277
307519054 https://github.com/pydata/xarray/issues/1225#issuecomment-307519054 https://api.github.com/repos/pydata/xarray/issues/1225 MDEyOklzc3VlQ29tbWVudDMwNzUxOTA1NA== shoyer 1217238 2017-06-09T23:02:20Z 2017-06-09T23:02:20Z MEMBER

@tbohn "self-contained" just means something that I can run on my machine. For example, the code above plus the "somefile.nc" netCDF file that I can load to reproduce this example.

Thinking about this a little more, I think the issue is somehow related to the encoding['chunksizes'] property on the Dataset variables loaded from the original netCDF file. Something like this should work as a work-around: del myds.var.encoding['chunksizes']

The bug is somewhere in our handling of chunksize encoding for netCDF4, but it is difficult to fix it without being able to run code that reproduces it.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  “ValueError: chunksize cannot exceed dimension size” when trying to write xarray to netcdf 202964277

Advanced export

JSON shape: default, array, newline-delimited, object

CSV options:

CREATE TABLE [issue_comments] (
   [html_url] TEXT,
   [issue_url] TEXT,
   [id] INTEGER PRIMARY KEY,
   [node_id] TEXT,
   [user] INTEGER REFERENCES [users]([id]),
   [created_at] TEXT,
   [updated_at] TEXT,
   [author_association] TEXT,
   [body] TEXT,
   [reactions] TEXT,
   [performed_via_github_app] TEXT,
   [issue] INTEGER REFERENCES [issues]([id])
);
CREATE INDEX [idx_issue_comments_issue]
    ON [issue_comments] ([issue]);
CREATE INDEX [idx_issue_comments_user]
    ON [issue_comments] ([user]);
Powered by Datasette · Queries took 12.609ms · About: xarray-datasette
  • Sort ascending
  • Sort descending
  • Facet by this
  • Hide this column
  • Show all columns
  • Show not-blank rows