html_url,issue_url,id,node_id,user,created_at,updated_at,author_association,body,reactions,performed_via_github_app,issue https://github.com/pydata/xarray/issues/1225#issuecomment-343335659,https://api.github.com/repos/pydata/xarray/issues/1225,343335659,MDEyOklzc3VlQ29tbWVudDM0MzMzNTY1OQ==,1217238,2017-11-10T00:23:32Z,2017-11-10T00:23:32Z,MEMBER,"Doing some digging, it turns out this turned up quite a while ago back in #156 where we added some code to fix this. Looking at @tbohn's dataset, the problem variable is actually the coordinate variable `'time'` corresponding to the unlimited dimension: ``` In [7]: ds.variables['time'] Out[7]: <class 'netCDF4._netCDF4.Variable'> int32 time(time) units: days since 2000-01-01 00:00:00.0 unlimited dimensions: time current shape = (5,) filling on, default _FillValue of -2147483647 used In [8]: ds.variables['time'].chunking() Out[8]: [1048576] In [9]: 2 ** 20 Out[9]: 1048576 In [10]: ds.dimensions Out[10]: OrderedDict([('veg_class', <class 'netCDF4._netCDF4.Dimension'>: name = 'veg_class', size = 19), ('lat', <class 'netCDF4._netCDF4.Dimension'>: name = 'lat', size = 160), ('lon', <class 'netCDF4._netCDF4.Dimension'>: name = 'lon', size = 160), ('time', <class 'netCDF4._netCDF4.Dimension'> (unlimited): name = 'time', size = 5)]) ``` For some reason netCDF4 gives it a chunking of 2 ** 20, even though it only has length 5. This leads to an error when we write a file back with the original chunking.","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,202964277 https://github.com/pydata/xarray/issues/1225#issuecomment-343332976,https://api.github.com/repos/pydata/xarray/issues/1225,343332976,MDEyOklzc3VlQ29tbWVudDM0MzMzMjk3Ng==,13906519,2017-11-10T00:07:24Z,2017-11-10T00:07:24Z,NONE,"Thanks for that Stephan. The workaround looks good for the moment ;-)... Detecting a mismatch (and maybe even correcting it) automatically would be very useful cheers, C","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,202964277 https://github.com/pydata/xarray/issues/1225#issuecomment-343332081,https://api.github.com/repos/pydata/xarray/issues/1225,343332081,MDEyOklzc3VlQ29tbWVudDM0MzMzMjA4MQ==,1217238,2017-11-10T00:02:07Z,2017-11-10T00:02:07Z,MEMBER,"@chrwerner Sorry to hear about your trouble, I will take another look at this. Right now, your best bet is probably something like: ```python def clean_dataset(ds): for var in ds.variables.values(): if 'chunksizes' in var.encoding: del var.encoding['chunksizes'] ```","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,202964277 https://github.com/pydata/xarray/issues/1225#issuecomment-343325842,https://api.github.com/repos/pydata/xarray/issues/1225,343325842,MDEyOklzc3VlQ29tbWVudDM0MzMyNTg0Mg==,13906519,2017-11-09T23:28:28Z,2017-11-09T23:28:28Z,NONE,"Is there any news on this? Have the same problem. A reset_chunksizes() method would be very helpful. Also, what is the cleanest way to remove all chunk size info? I have a very long computation and it fails at the very end with the mentioned error message. My file is patched together from many sources... cheers","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,202964277 https://github.com/pydata/xarray/issues/1225#issuecomment-326146218,https://api.github.com/repos/pydata/xarray/issues/1225,326146218,MDEyOklzc3VlQ29tbWVudDMyNjE0NjIxOA==,3496314,2017-08-30T23:23:16Z,2017-08-30T23:23:16Z,NONE,"OK, thanks Joe and Stephan. On Wed, Aug 30, 2017 at 3:36 PM, Joe Hamman <notifications@github.com> wrote: > @tbohn <https://github.com/tbohn> - What is happening here is that xarray > is storing the netCDF4 chunk size from the input file. For the LAI > variable in your example, that isLAI:_ChunkSizes = 19, 1, 160, 160 ; (you > can see this with ncdump -h -s filename.nc). > > $ ncdump -s -h veg_hist.0_10n.90_80w.2000_2016.mode_PFT.5dates.nc > netcdf veg_hist.0_10n.90_80w.2000_2016.mode_PFT.5dates { > dimensions: > veg_class = 19 ; > lat = 160 ; > lon = 160 ; > time = UNLIMITED ; // (5 currently) > variables: > float Cv(veg_class, lat, lon) ; > Cv:_FillValue = -1.f ; > Cv:units = ""-"" ; > Cv:longname = ""Area Fraction"" ; > Cv:missing_value = -1.f ; > Cv:_Storage = ""contiguous"" ; > Cv:_Endianness = ""little"" ; > float LAI(veg_class, time, lat, lon) ; > LAI:_FillValue = -1.f ; > LAI:units = ""m2/m2"" ; > LAI:longname = ""Leaf Area Index"" ; > LAI:missing_value = -1.f ; > LAI:_Storage = ""chunked"" ; > LAI:_ChunkSizes = 19, 1, 160, 160 ; > LAI:_Endianness = ""little"" ; > ... > > Those integers correspond to the dimensions from LAI. When you slice your > dataset, you end up with lat/lon dimensions that are now smaller than the > _ChunkSizes. When writing this back to netCDF, xarray is still trying to > use the original encoding attribute. > > The logical fix is to validate this encoding attribute and either 1) throw > an informative error if something isn't going to work, or 2) change the > ChunkSizes. > > — > You are receiving this because you were mentioned. > Reply to this email directly, view it on GitHub > <https://github.com/pydata/xarray/issues/1225#issuecomment-326138431>, or mute > the thread > <https://github.com/notifications/unsubscribe-auth/ADVZeo0qPYlMc_a8UeGDNp04jtFXqkgOks5sdePhgaJpZM4Ls47i> > . > ","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,202964277 https://github.com/pydata/xarray/issues/1225#issuecomment-326138431,https://api.github.com/repos/pydata/xarray/issues/1225,326138431,MDEyOklzc3VlQ29tbWVudDMyNjEzODQzMQ==,2443309,2017-08-30T22:36:14Z,2017-08-30T22:36:14Z,MEMBER,"@tbohn - What is happening here is that xarray is storing the netCDF4 chunk size from the input file. For the `LAI` variable in your example, that is`LAI:_ChunkSizes = 19, 1, 160, 160 ;` (you can see this with `ncdump -h -s filename.nc`). ```shell $ ncdump -s -h veg_hist.0_10n.90_80w.2000_2016.mode_PFT.5dates.nc netcdf veg_hist.0_10n.90_80w.2000_2016.mode_PFT.5dates { dimensions: veg_class = 19 ; lat = 160 ; lon = 160 ; time = UNLIMITED ; // (5 currently) variables: float Cv(veg_class, lat, lon) ; Cv:_FillValue = -1.f ; Cv:units = ""-"" ; Cv:longname = ""Area Fraction"" ; Cv:missing_value = -1.f ; Cv:_Storage = ""contiguous"" ; Cv:_Endianness = ""little"" ; float LAI(veg_class, time, lat, lon) ; LAI:_FillValue = -1.f ; LAI:units = ""m2/m2"" ; LAI:longname = ""Leaf Area Index"" ; LAI:missing_value = -1.f ; LAI:_Storage = ""chunked"" ; LAI:_ChunkSizes = 19, 1, 160, 160 ; LAI:_Endianness = ""little"" ; ... ``` Those integers correspond to the dimensions from LAI. When you slice your dataset, you end up with lat/lon dimensions that are now smaller than the `_ChunkSizes`. When writing this back to netCDF, xarray is still trying to use the original `encoding` attribute. The logical fix is to validate this encoding attribute and either 1) throw an informative error if something isn't going to work, or 2) change the `ChunkSizes`.","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,202964277 https://github.com/pydata/xarray/issues/1225#issuecomment-307524160,https://api.github.com/repos/pydata/xarray/issues/1225,307524160,MDEyOklzc3VlQ29tbWVudDMwNzUyNDE2MA==,3496314,2017-06-09T23:32:38Z,2017-08-30T22:26:44Z,NONE,"OK, here's my code and the file that it works (fails) on. Code: ```Python import os.path import numpy as np import xarray as xr ds = xr.open_dataset('veg_hist.0_10n.90_80w.2000_2016.mode_PFT.5dates.nc') ds_out = ds.isel(lat=slice(0,16),lon=slice(0,16)) #ds_out.encoding['unlimited_dims'] = 'time' ds_out.to_netcdf('test.out.nc') ``` Note that I commented out the attempt to make 'time' unlimited - if I attempt it, I get a slightly different chunk size error ('NetCDF: Bad chunk sizes'). I realize that for now I can use 'ncks' as a workaround, but seems to me that xarray should be able to do this too. File (attached) [veg_hist.0_10n.90_80w.2000_2016.mode_PFT.5dates.nc.zip](https://github.com/pydata/xarray/files/1065436/veg_hist.0_10n.90_80w.2000_2016.mode_PFT.5dates.nc.zip) ","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,202964277 https://github.com/pydata/xarray/issues/1225#issuecomment-307524406,https://api.github.com/repos/pydata/xarray/issues/1225,307524406,MDEyOklzc3VlQ29tbWVudDMwNzUyNDQwNg==,3496314,2017-06-09T23:34:44Z,2017-06-09T23:34:44Z,NONE,"(note also that for the example nc file I provided, the slice that my example code makes contains nothing but null values - but that's irrelevant - the error happens for other slices that do contain non-null values.)","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,202964277 https://github.com/pydata/xarray/issues/1225#issuecomment-307519054,https://api.github.com/repos/pydata/xarray/issues/1225,307519054,MDEyOklzc3VlQ29tbWVudDMwNzUxOTA1NA==,1217238,2017-06-09T23:02:20Z,2017-06-09T23:02:20Z,MEMBER,"@tbohn ""self-contained"" just means something that I can run on my machine. For example, the code above plus the ""somefile.nc"" netCDF file that I can load to reproduce this example. Thinking about this a little more, I think the issue is somehow related to the `encoding['chunksizes']` property on the Dataset variables loaded from the original netCDF file. Something like this should work as a work-around: ``` del myds.var.encoding['chunksizes'] ``` The bug is somewhere in our [handling of chunksize encoding](https://github.com/pydata/xarray/blob/bbeab6954c4bf06145c64bf90fbb268fce2ab7f1/xarray/backends/netCDF4_.py#L160) for netCDF4, but it is difficult to fix it without being able to run code that reproduces it.","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,202964277 https://github.com/pydata/xarray/issues/1225#issuecomment-307518173,https://api.github.com/repos/pydata/xarray/issues/1225,307518173,MDEyOklzc3VlQ29tbWVudDMwNzUxODE3Mw==,3496314,2017-06-09T22:55:20Z,2017-06-09T22:55:20Z,NONE,"I've been encountering this as well, and I don't want to use the scipy engine workaround. If you can tell me what a ""self-contained"" example means, I can also try to provide one.","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,202964277 https://github.com/pydata/xarray/issues/1225#issuecomment-306620537,https://api.github.com/repos/pydata/xarray/issues/1225,306620537,MDEyOklzc3VlQ29tbWVudDMwNjYyMDUzNw==,6101444,2017-06-06T21:19:21Z,2017-06-06T21:19:21Z,NONE,I've also just encountered this. Will try to to reproduce a self-contained example. ,"{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,202964277