html_url,issue_url,id,node_id,user,created_at,updated_at,author_association,body,reactions,performed_via_github_app,issue
https://github.com/pydata/xarray/issues/1225#issuecomment-343335659,https://api.github.com/repos/pydata/xarray/issues/1225,343335659,MDEyOklzc3VlQ29tbWVudDM0MzMzNTY1OQ==,1217238,2017-11-10T00:23:32Z,2017-11-10T00:23:32Z,MEMBER,"Doing some digging, it turns out this turned up quite a while ago back in #156 where we added some code to fix this.

Looking at @tbohn's dataset, the problem variable is actually the coordinate variable `'time'` corresponding to the unlimited dimension:
```
In [7]: ds.variables['time']
Out[7]:
<class 'netCDF4._netCDF4.Variable'>
int32 time(time)
    units: days since 2000-01-01 00:00:00.0
unlimited dimensions: time
current shape = (5,)
filling on, default _FillValue of -2147483647 used

In [8]: ds.variables['time'].chunking()
Out[8]: [1048576]

In [9]: 2 ** 20
Out[9]: 1048576

In [10]: ds.dimensions
Out[10]:
OrderedDict([('veg_class',
              <class 'netCDF4._netCDF4.Dimension'>: name = 'veg_class', size = 19),
             ('lat',
              <class 'netCDF4._netCDF4.Dimension'>: name = 'lat', size = 160),
             ('lon',
              <class 'netCDF4._netCDF4.Dimension'>: name = 'lon', size = 160),
             ('time',
              <class 'netCDF4._netCDF4.Dimension'> (unlimited): name = 'time', size = 5)])
```

For some reason netCDF4 gives it a chunking of 2 ** 20, even though it only has length 5. This leads to an error when we write a file back with the original chunking.","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,202964277
https://github.com/pydata/xarray/issues/1225#issuecomment-343332976,https://api.github.com/repos/pydata/xarray/issues/1225,343332976,MDEyOklzc3VlQ29tbWVudDM0MzMzMjk3Ng==,13906519,2017-11-10T00:07:24Z,2017-11-10T00:07:24Z,NONE,"Thanks for that Stephan.

The workaround looks good for the moment ;-)...
Detecting a mismatch (and maybe even correcting it) automatically would be very useful

cheers,
C","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,202964277
https://github.com/pydata/xarray/issues/1225#issuecomment-343332081,https://api.github.com/repos/pydata/xarray/issues/1225,343332081,MDEyOklzc3VlQ29tbWVudDM0MzMzMjA4MQ==,1217238,2017-11-10T00:02:07Z,2017-11-10T00:02:07Z,MEMBER,"@chrwerner Sorry to hear about your trouble, I will take another look at this.

Right now, your best bet is probably something like:
```python
def clean_dataset(ds):
    for var in ds.variables.values():
        if 'chunksizes' in var.encoding:
            del var.encoding['chunksizes']
```","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,202964277
https://github.com/pydata/xarray/issues/1225#issuecomment-343325842,https://api.github.com/repos/pydata/xarray/issues/1225,343325842,MDEyOklzc3VlQ29tbWVudDM0MzMyNTg0Mg==,13906519,2017-11-09T23:28:28Z,2017-11-09T23:28:28Z,NONE,"Is there any news on this? Have the same problem. A reset_chunksizes() method would be very helpful. Also, what is the cleanest way to remove all chunk size info? I have a very long computation and it fails at the very end with the mentioned error message. My file is patched together from many sources...

cheers","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,202964277
https://github.com/pydata/xarray/issues/1225#issuecomment-326146218,https://api.github.com/repos/pydata/xarray/issues/1225,326146218,MDEyOklzc3VlQ29tbWVudDMyNjE0NjIxOA==,3496314,2017-08-30T23:23:16Z,2017-08-30T23:23:16Z,NONE,"OK, thanks Joe and Stephan.

On Wed, Aug 30, 2017 at 3:36 PM, Joe Hamman <notifications@github.com>
wrote:

> @tbohn <https://github.com/tbohn> - What is happening here is that xarray
> is storing the netCDF4 chunk size from the input file. For the LAI
> variable in your example, that isLAI:_ChunkSizes = 19, 1, 160, 160 ; (you
> can see this with ncdump -h -s filename.nc).
>
> $ ncdump -s -h veg_hist.0_10n.90_80w.2000_2016.mode_PFT.5dates.nc
> netcdf veg_hist.0_10n.90_80w.2000_2016.mode_PFT.5dates {
> dimensions:
> 	veg_class = 19 ;
> 	lat = 160 ;
> 	lon = 160 ;
> 	time = UNLIMITED ; // (5 currently)
> variables:
> 	float Cv(veg_class, lat, lon) ;
> 		Cv:_FillValue = -1.f ;
> 		Cv:units = ""-"" ;
> 		Cv:longname = ""Area Fraction"" ;
> 		Cv:missing_value = -1.f ;
> 		Cv:_Storage = ""contiguous"" ;
> 		Cv:_Endianness = ""little"" ;
> 	float LAI(veg_class, time, lat, lon) ;
> 		LAI:_FillValue = -1.f ;
> 		LAI:units = ""m2/m2"" ;
> 		LAI:longname = ""Leaf Area Index"" ;
> 		LAI:missing_value = -1.f ;
> 		LAI:_Storage = ""chunked"" ;
> 		LAI:_ChunkSizes = 19, 1, 160, 160 ;
> 		LAI:_Endianness = ""little"" ;
> ...
>
> Those integers correspond to the dimensions from LAI. When you slice your
> dataset, you end up with lat/lon dimensions that are now smaller than the
> _ChunkSizes. When writing this back to netCDF, xarray is still trying to
> use the original encoding attribute.
>
> The logical fix is to validate this encoding attribute and either 1) throw
> an informative error if something isn't going to work, or 2) change the
> ChunkSizes.
>
> —
> You are receiving this because you were mentioned.
> Reply to this email directly, view it on GitHub
> <https://github.com/pydata/xarray/issues/1225#issuecomment-326138431>, or mute
> the thread
> <https://github.com/notifications/unsubscribe-auth/ADVZeo0qPYlMc_a8UeGDNp04jtFXqkgOks5sdePhgaJpZM4Ls47i>
> .
>
","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,202964277
https://github.com/pydata/xarray/issues/1225#issuecomment-326138431,https://api.github.com/repos/pydata/xarray/issues/1225,326138431,MDEyOklzc3VlQ29tbWVudDMyNjEzODQzMQ==,2443309,2017-08-30T22:36:14Z,2017-08-30T22:36:14Z,MEMBER,"@tbohn - What is happening here is that xarray is storing the netCDF4 chunk size from the input file. For the `LAI` variable in your example, that is`LAI:_ChunkSizes = 19, 1, 160, 160 ;` (you can see this with `ncdump -h -s filename.nc`). 

```shell
$ ncdump -s -h veg_hist.0_10n.90_80w.2000_2016.mode_PFT.5dates.nc
netcdf veg_hist.0_10n.90_80w.2000_2016.mode_PFT.5dates {
dimensions:
	veg_class = 19 ;
	lat = 160 ;
	lon = 160 ;
	time = UNLIMITED ; // (5 currently)
variables:
	float Cv(veg_class, lat, lon) ;
		Cv:_FillValue = -1.f ;
		Cv:units = ""-"" ;
		Cv:longname = ""Area Fraction"" ;
		Cv:missing_value = -1.f ;
		Cv:_Storage = ""contiguous"" ;
		Cv:_Endianness = ""little"" ;
	float LAI(veg_class, time, lat, lon) ;
		LAI:_FillValue = -1.f ;
		LAI:units = ""m2/m2"" ;
		LAI:longname = ""Leaf Area Index"" ;
		LAI:missing_value = -1.f ;
		LAI:_Storage = ""chunked"" ;
		LAI:_ChunkSizes = 19, 1, 160, 160 ;
		LAI:_Endianness = ""little"" ;
...
```

Those integers correspond to the dimensions from LAI. When you slice your dataset, you end up with lat/lon dimensions that are now smaller than the `_ChunkSizes`. When writing this back to netCDF, xarray is still trying to use the original `encoding` attribute.

The logical fix is to validate this encoding attribute and either 1) throw an informative error if something isn't going to work, or 2) change the `ChunkSizes`.","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,202964277
https://github.com/pydata/xarray/issues/1225#issuecomment-307524160,https://api.github.com/repos/pydata/xarray/issues/1225,307524160,MDEyOklzc3VlQ29tbWVudDMwNzUyNDE2MA==,3496314,2017-06-09T23:32:38Z,2017-08-30T22:26:44Z,NONE,"OK, here's my code and the file that it works (fails) on.

Code:
```Python
import os.path
import numpy as np
import xarray as xr
ds = xr.open_dataset('veg_hist.0_10n.90_80w.2000_2016.mode_PFT.5dates.nc')
ds_out = ds.isel(lat=slice(0,16),lon=slice(0,16))
#ds_out.encoding['unlimited_dims'] = 'time'
ds_out.to_netcdf('test.out.nc')
```

Note that I commented out the attempt to make 'time' unlimited - if I attempt it, I get a slightly different chunk size error ('NetCDF: Bad chunk sizes').

I realize that for now I can use 'ncks' as a workaround, but seems to me that xarray should be able to do this too.

File (attached)
[veg_hist.0_10n.90_80w.2000_2016.mode_PFT.5dates.nc.zip](https://github.com/pydata/xarray/files/1065436/veg_hist.0_10n.90_80w.2000_2016.mode_PFT.5dates.nc.zip)

","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,202964277
https://github.com/pydata/xarray/issues/1225#issuecomment-307524406,https://api.github.com/repos/pydata/xarray/issues/1225,307524406,MDEyOklzc3VlQ29tbWVudDMwNzUyNDQwNg==,3496314,2017-06-09T23:34:44Z,2017-06-09T23:34:44Z,NONE,"(note also that for the example nc file I provided, the slice that my example code makes contains nothing but null values - but that's irrelevant - the error happens for other slices that do contain non-null values.)","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,202964277
https://github.com/pydata/xarray/issues/1225#issuecomment-307519054,https://api.github.com/repos/pydata/xarray/issues/1225,307519054,MDEyOklzc3VlQ29tbWVudDMwNzUxOTA1NA==,1217238,2017-06-09T23:02:20Z,2017-06-09T23:02:20Z,MEMBER,"@tbohn ""self-contained"" just means something that I can run on my machine. For example, the code above plus the ""somefile.nc"" netCDF file that I can load to reproduce this example.

Thinking about this a little more, I think the issue is somehow related to the `encoding['chunksizes']` property on the Dataset variables loaded from the original netCDF file. Something like this should work as a work-around:
```
del myds.var.encoding['chunksizes']
```

The bug is somewhere in our [handling of chunksize encoding](https://github.com/pydata/xarray/blob/bbeab6954c4bf06145c64bf90fbb268fce2ab7f1/xarray/backends/netCDF4_.py#L160) for netCDF4, but it is difficult to fix it without being able to run code that reproduces it.","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,202964277
https://github.com/pydata/xarray/issues/1225#issuecomment-307518173,https://api.github.com/repos/pydata/xarray/issues/1225,307518173,MDEyOklzc3VlQ29tbWVudDMwNzUxODE3Mw==,3496314,2017-06-09T22:55:20Z,2017-06-09T22:55:20Z,NONE,"I've been encountering this as well, and I don't want to use the scipy engine workaround.  If you can tell me what a ""self-contained"" example means, I can also try to provide one.","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,202964277
https://github.com/pydata/xarray/issues/1225#issuecomment-306620537,https://api.github.com/repos/pydata/xarray/issues/1225,306620537,MDEyOklzc3VlQ29tbWVudDMwNjYyMDUzNw==,6101444,2017-06-06T21:19:21Z,2017-06-06T21:19:21Z,NONE,I've also just encountered this. Will try to to reproduce a self-contained example. ,"{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,202964277