home / github

Menu
  • Search all tables
  • GraphQL API

issue_comments

Table actions
  • GraphQL API for issue_comments

11 rows where issue = 202964277 sorted by updated_at descending

✎ View and edit SQL

This data as json, CSV (advanced)

Suggested facets: created_at (date), updated_at (date)

user 5

  • tbohn 4
  • shoyer 3
  • cwerner 2
  • jhamman 1
  • jgerardsimcock 1

author_association 2

  • NONE 7
  • MEMBER 4

issue 1

  • “ValueError: chunksize cannot exceed dimension size” when trying to write xarray to netcdf · 11 ✖
id html_url issue_url node_id user created_at updated_at ▲ author_association body reactions performed_via_github_app issue
343335659 https://github.com/pydata/xarray/issues/1225#issuecomment-343335659 https://api.github.com/repos/pydata/xarray/issues/1225 MDEyOklzc3VlQ29tbWVudDM0MzMzNTY1OQ== shoyer 1217238 2017-11-10T00:23:32Z 2017-11-10T00:23:32Z MEMBER

Doing some digging, it turns out this turned up quite a while ago back in #156 where we added some code to fix this.

Looking at @tbohn's dataset, the problem variable is actually the coordinate variable 'time' corresponding to the unlimited dimension: ``` In [7]: ds.variables['time'] Out[7]: <class 'netCDF4._netCDF4.Variable'> int32 time(time) units: days since 2000-01-01 00:00:00.0 unlimited dimensions: time current shape = (5,) filling on, default _FillValue of -2147483647 used

In [8]: ds.variables['time'].chunking() Out[8]: [1048576]

In [9]: 2 ** 20 Out[9]: 1048576

In [10]: ds.dimensions Out[10]: OrderedDict([('veg_class', <class 'netCDF4._netCDF4.Dimension'>: name = 'veg_class', size = 19), ('lat', <class 'netCDF4._netCDF4.Dimension'>: name = 'lat', size = 160), ('lon', <class 'netCDF4._netCDF4.Dimension'>: name = 'lon', size = 160), ('time', <class 'netCDF4._netCDF4.Dimension'> (unlimited): name = 'time', size = 5)]) ```

For some reason netCDF4 gives it a chunking of 2 ** 20, even though it only has length 5. This leads to an error when we write a file back with the original chunking.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  “ValueError: chunksize cannot exceed dimension size” when trying to write xarray to netcdf 202964277
343332976 https://github.com/pydata/xarray/issues/1225#issuecomment-343332976 https://api.github.com/repos/pydata/xarray/issues/1225 MDEyOklzc3VlQ29tbWVudDM0MzMzMjk3Ng== cwerner 13906519 2017-11-10T00:07:24Z 2017-11-10T00:07:24Z NONE

Thanks for that Stephan.

The workaround looks good for the moment ;-)... Detecting a mismatch (and maybe even correcting it) automatically would be very useful

cheers, C

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  “ValueError: chunksize cannot exceed dimension size” when trying to write xarray to netcdf 202964277
343332081 https://github.com/pydata/xarray/issues/1225#issuecomment-343332081 https://api.github.com/repos/pydata/xarray/issues/1225 MDEyOklzc3VlQ29tbWVudDM0MzMzMjA4MQ== shoyer 1217238 2017-11-10T00:02:07Z 2017-11-10T00:02:07Z MEMBER

@chrwerner Sorry to hear about your trouble, I will take another look at this.

Right now, your best bet is probably something like: python def clean_dataset(ds): for var in ds.variables.values(): if 'chunksizes' in var.encoding: del var.encoding['chunksizes']

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  “ValueError: chunksize cannot exceed dimension size” when trying to write xarray to netcdf 202964277
343325842 https://github.com/pydata/xarray/issues/1225#issuecomment-343325842 https://api.github.com/repos/pydata/xarray/issues/1225 MDEyOklzc3VlQ29tbWVudDM0MzMyNTg0Mg== cwerner 13906519 2017-11-09T23:28:28Z 2017-11-09T23:28:28Z NONE

Is there any news on this? Have the same problem. A reset_chunksizes() method would be very helpful. Also, what is the cleanest way to remove all chunk size info? I have a very long computation and it fails at the very end with the mentioned error message. My file is patched together from many sources...

cheers

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  “ValueError: chunksize cannot exceed dimension size” when trying to write xarray to netcdf 202964277
326146218 https://github.com/pydata/xarray/issues/1225#issuecomment-326146218 https://api.github.com/repos/pydata/xarray/issues/1225 MDEyOklzc3VlQ29tbWVudDMyNjE0NjIxOA== tbohn 3496314 2017-08-30T23:23:16Z 2017-08-30T23:23:16Z NONE

OK, thanks Joe and Stephan.

On Wed, Aug 30, 2017 at 3:36 PM, Joe Hamman notifications@github.com wrote:

@tbohn https://github.com/tbohn - What is happening here is that xarray is storing the netCDF4 chunk size from the input file. For the LAI variable in your example, that isLAI:_ChunkSizes = 19, 1, 160, 160 ; (you can see this with ncdump -h -s filename.nc).

$ ncdump -s -h veg_hist.0_10n.90_80w.2000_2016.mode_PFT.5dates.nc netcdf veg_hist.0_10n.90_80w.2000_2016.mode_PFT.5dates { dimensions: veg_class = 19 ; lat = 160 ; lon = 160 ; time = UNLIMITED ; // (5 currently) variables: float Cv(veg_class, lat, lon) ; Cv:_FillValue = -1.f ; Cv:units = "-" ; Cv:longname = "Area Fraction" ; Cv:missing_value = -1.f ; Cv:_Storage = "contiguous" ; Cv:_Endianness = "little" ; float LAI(veg_class, time, lat, lon) ; LAI:_FillValue = -1.f ; LAI:units = "m2/m2" ; LAI:longname = "Leaf Area Index" ; LAI:missing_value = -1.f ; LAI:_Storage = "chunked" ; LAI:_ChunkSizes = 19, 1, 160, 160 ; LAI:_Endianness = "little" ; ...

Those integers correspond to the dimensions from LAI. When you slice your dataset, you end up with lat/lon dimensions that are now smaller than the _ChunkSizes. When writing this back to netCDF, xarray is still trying to use the original encoding attribute.

The logical fix is to validate this encoding attribute and either 1) throw an informative error if something isn't going to work, or 2) change the ChunkSizes.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/pydata/xarray/issues/1225#issuecomment-326138431, or mute the thread https://github.com/notifications/unsubscribe-auth/ADVZeo0qPYlMc_a8UeGDNp04jtFXqkgOks5sdePhgaJpZM4Ls47i .

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  “ValueError: chunksize cannot exceed dimension size” when trying to write xarray to netcdf 202964277
326138431 https://github.com/pydata/xarray/issues/1225#issuecomment-326138431 https://api.github.com/repos/pydata/xarray/issues/1225 MDEyOklzc3VlQ29tbWVudDMyNjEzODQzMQ== jhamman 2443309 2017-08-30T22:36:14Z 2017-08-30T22:36:14Z MEMBER

@tbohn - What is happening here is that xarray is storing the netCDF4 chunk size from the input file. For the LAI variable in your example, that isLAI:_ChunkSizes = 19, 1, 160, 160 ; (you can see this with ncdump -h -s filename.nc).

shell $ ncdump -s -h veg_hist.0_10n.90_80w.2000_2016.mode_PFT.5dates.nc netcdf veg_hist.0_10n.90_80w.2000_2016.mode_PFT.5dates { dimensions: veg_class = 19 ; lat = 160 ; lon = 160 ; time = UNLIMITED ; // (5 currently) variables: float Cv(veg_class, lat, lon) ; Cv:_FillValue = -1.f ; Cv:units = "-" ; Cv:longname = "Area Fraction" ; Cv:missing_value = -1.f ; Cv:_Storage = "contiguous" ; Cv:_Endianness = "little" ; float LAI(veg_class, time, lat, lon) ; LAI:_FillValue = -1.f ; LAI:units = "m2/m2" ; LAI:longname = "Leaf Area Index" ; LAI:missing_value = -1.f ; LAI:_Storage = "chunked" ; LAI:_ChunkSizes = 19, 1, 160, 160 ; LAI:_Endianness = "little" ; ...

Those integers correspond to the dimensions from LAI. When you slice your dataset, you end up with lat/lon dimensions that are now smaller than the _ChunkSizes. When writing this back to netCDF, xarray is still trying to use the original encoding attribute.

The logical fix is to validate this encoding attribute and either 1) throw an informative error if something isn't going to work, or 2) change the ChunkSizes.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  “ValueError: chunksize cannot exceed dimension size” when trying to write xarray to netcdf 202964277
307524160 https://github.com/pydata/xarray/issues/1225#issuecomment-307524160 https://api.github.com/repos/pydata/xarray/issues/1225 MDEyOklzc3VlQ29tbWVudDMwNzUyNDE2MA== tbohn 3496314 2017-06-09T23:32:38Z 2017-08-30T22:26:44Z NONE

OK, here's my code and the file that it works (fails) on.

Code: ```Python import os.path import numpy as np import xarray as xr ds = xr.open_dataset('veg_hist.0_10n.90_80w.2000_2016.mode_PFT.5dates.nc') ds_out = ds.isel(lat=slice(0,16),lon=slice(0,16))

ds_out.encoding['unlimited_dims'] = 'time'

ds_out.to_netcdf('test.out.nc') ```

Note that I commented out the attempt to make 'time' unlimited - if I attempt it, I get a slightly different chunk size error ('NetCDF: Bad chunk sizes').

I realize that for now I can use 'ncks' as a workaround, but seems to me that xarray should be able to do this too.

File (attached) veg_hist.0_10n.90_80w.2000_2016.mode_PFT.5dates.nc.zip

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  “ValueError: chunksize cannot exceed dimension size” when trying to write xarray to netcdf 202964277
307524406 https://github.com/pydata/xarray/issues/1225#issuecomment-307524406 https://api.github.com/repos/pydata/xarray/issues/1225 MDEyOklzc3VlQ29tbWVudDMwNzUyNDQwNg== tbohn 3496314 2017-06-09T23:34:44Z 2017-06-09T23:34:44Z NONE

(note also that for the example nc file I provided, the slice that my example code makes contains nothing but null values - but that's irrelevant - the error happens for other slices that do contain non-null values.)

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  “ValueError: chunksize cannot exceed dimension size” when trying to write xarray to netcdf 202964277
307519054 https://github.com/pydata/xarray/issues/1225#issuecomment-307519054 https://api.github.com/repos/pydata/xarray/issues/1225 MDEyOklzc3VlQ29tbWVudDMwNzUxOTA1NA== shoyer 1217238 2017-06-09T23:02:20Z 2017-06-09T23:02:20Z MEMBER

@tbohn "self-contained" just means something that I can run on my machine. For example, the code above plus the "somefile.nc" netCDF file that I can load to reproduce this example.

Thinking about this a little more, I think the issue is somehow related to the encoding['chunksizes'] property on the Dataset variables loaded from the original netCDF file. Something like this should work as a work-around: del myds.var.encoding['chunksizes']

The bug is somewhere in our handling of chunksize encoding for netCDF4, but it is difficult to fix it without being able to run code that reproduces it.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  “ValueError: chunksize cannot exceed dimension size” when trying to write xarray to netcdf 202964277
307518173 https://github.com/pydata/xarray/issues/1225#issuecomment-307518173 https://api.github.com/repos/pydata/xarray/issues/1225 MDEyOklzc3VlQ29tbWVudDMwNzUxODE3Mw== tbohn 3496314 2017-06-09T22:55:20Z 2017-06-09T22:55:20Z NONE

I've been encountering this as well, and I don't want to use the scipy engine workaround. If you can tell me what a "self-contained" example means, I can also try to provide one.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  “ValueError: chunksize cannot exceed dimension size” when trying to write xarray to netcdf 202964277
306620537 https://github.com/pydata/xarray/issues/1225#issuecomment-306620537 https://api.github.com/repos/pydata/xarray/issues/1225 MDEyOklzc3VlQ29tbWVudDMwNjYyMDUzNw== jgerardsimcock 6101444 2017-06-06T21:19:21Z 2017-06-06T21:19:21Z NONE

I've also just encountered this. Will try to to reproduce a self-contained example.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  “ValueError: chunksize cannot exceed dimension size” when trying to write xarray to netcdf 202964277

Advanced export

JSON shape: default, array, newline-delimited, object

CSV options:

CREATE TABLE [issue_comments] (
   [html_url] TEXT,
   [issue_url] TEXT,
   [id] INTEGER PRIMARY KEY,
   [node_id] TEXT,
   [user] INTEGER REFERENCES [users]([id]),
   [created_at] TEXT,
   [updated_at] TEXT,
   [author_association] TEXT,
   [body] TEXT,
   [reactions] TEXT,
   [performed_via_github_app] TEXT,
   [issue] INTEGER REFERENCES [issues]([id])
);
CREATE INDEX [idx_issue_comments_issue]
    ON [issue_comments] ([issue]);
CREATE INDEX [idx_issue_comments_user]
    ON [issue_comments] ([user]);
Powered by Datasette · Queries took 402.597ms · About: xarray-datasette