html_url,issue_url,id,node_id,user,created_at,updated_at,author_association,body,reactions,performed_via_github_app,issue
https://github.com/pydata/xarray/issues/1225#issuecomment-343335659,https://api.github.com/repos/pydata/xarray/issues/1225,343335659,MDEyOklzc3VlQ29tbWVudDM0MzMzNTY1OQ==,1217238,2017-11-10T00:23:32Z,2017-11-10T00:23:32Z,MEMBER,"Doing some digging, it turns out this turned up quite a while ago back in #156 where we added some code to fix this.
Looking at @tbohn's dataset, the problem variable is actually the coordinate variable `'time'` corresponding to the unlimited dimension:
```
In [7]: ds.variables['time']
Out[7]:
int32 time(time)
units: days since 2000-01-01 00:00:00.0
unlimited dimensions: time
current shape = (5,)
filling on, default _FillValue of -2147483647 used
In [8]: ds.variables['time'].chunking()
Out[8]: [1048576]
In [9]: 2 ** 20
Out[9]: 1048576
In [10]: ds.dimensions
Out[10]:
OrderedDict([('veg_class',
: name = 'veg_class', size = 19),
('lat',
: name = 'lat', size = 160),
('lon',
: name = 'lon', size = 160),
('time',
(unlimited): name = 'time', size = 5)])
```
For some reason netCDF4 gives it a chunking of 2 ** 20, even though it only has length 5. This leads to an error when we write a file back with the original chunking.","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,202964277
https://github.com/pydata/xarray/issues/1225#issuecomment-343332976,https://api.github.com/repos/pydata/xarray/issues/1225,343332976,MDEyOklzc3VlQ29tbWVudDM0MzMzMjk3Ng==,13906519,2017-11-10T00:07:24Z,2017-11-10T00:07:24Z,NONE,"Thanks for that Stephan.
The workaround looks good for the moment ;-)...
Detecting a mismatch (and maybe even correcting it) automatically would be very useful
cheers,
C","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,202964277
https://github.com/pydata/xarray/issues/1225#issuecomment-343332081,https://api.github.com/repos/pydata/xarray/issues/1225,343332081,MDEyOklzc3VlQ29tbWVudDM0MzMzMjA4MQ==,1217238,2017-11-10T00:02:07Z,2017-11-10T00:02:07Z,MEMBER,"@chrwerner Sorry to hear about your trouble, I will take another look at this.
Right now, your best bet is probably something like:
```python
def clean_dataset(ds):
for var in ds.variables.values():
if 'chunksizes' in var.encoding:
del var.encoding['chunksizes']
```","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,202964277
https://github.com/pydata/xarray/issues/1225#issuecomment-343325842,https://api.github.com/repos/pydata/xarray/issues/1225,343325842,MDEyOklzc3VlQ29tbWVudDM0MzMyNTg0Mg==,13906519,2017-11-09T23:28:28Z,2017-11-09T23:28:28Z,NONE,"Is there any news on this? Have the same problem. A reset_chunksizes() method would be very helpful. Also, what is the cleanest way to remove all chunk size info? I have a very long computation and it fails at the very end with the mentioned error message. My file is patched together from many sources...
cheers","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,202964277
https://github.com/pydata/xarray/issues/1225#issuecomment-326146218,https://api.github.com/repos/pydata/xarray/issues/1225,326146218,MDEyOklzc3VlQ29tbWVudDMyNjE0NjIxOA==,3496314,2017-08-30T23:23:16Z,2017-08-30T23:23:16Z,NONE,"OK, thanks Joe and Stephan.
On Wed, Aug 30, 2017 at 3:36 PM, Joe Hamman
wrote:
> @tbohn - What is happening here is that xarray
> is storing the netCDF4 chunk size from the input file. For the LAI
> variable in your example, that isLAI:_ChunkSizes = 19, 1, 160, 160 ; (you
> can see this with ncdump -h -s filename.nc).
>
> $ ncdump -s -h veg_hist.0_10n.90_80w.2000_2016.mode_PFT.5dates.nc
> netcdf veg_hist.0_10n.90_80w.2000_2016.mode_PFT.5dates {
> dimensions:
> veg_class = 19 ;
> lat = 160 ;
> lon = 160 ;
> time = UNLIMITED ; // (5 currently)
> variables:
> float Cv(veg_class, lat, lon) ;
> Cv:_FillValue = -1.f ;
> Cv:units = ""-"" ;
> Cv:longname = ""Area Fraction"" ;
> Cv:missing_value = -1.f ;
> Cv:_Storage = ""contiguous"" ;
> Cv:_Endianness = ""little"" ;
> float LAI(veg_class, time, lat, lon) ;
> LAI:_FillValue = -1.f ;
> LAI:units = ""m2/m2"" ;
> LAI:longname = ""Leaf Area Index"" ;
> LAI:missing_value = -1.f ;
> LAI:_Storage = ""chunked"" ;
> LAI:_ChunkSizes = 19, 1, 160, 160 ;
> LAI:_Endianness = ""little"" ;
> ...
>
> Those integers correspond to the dimensions from LAI. When you slice your
> dataset, you end up with lat/lon dimensions that are now smaller than the
> _ChunkSizes. When writing this back to netCDF, xarray is still trying to
> use the original encoding attribute.
>
> The logical fix is to validate this encoding attribute and either 1) throw
> an informative error if something isn't going to work, or 2) change the
> ChunkSizes.
>
> —
> You are receiving this because you were mentioned.
> Reply to this email directly, view it on GitHub
> , or mute
> the thread
>
> .
>
","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,202964277
https://github.com/pydata/xarray/issues/1225#issuecomment-326138431,https://api.github.com/repos/pydata/xarray/issues/1225,326138431,MDEyOklzc3VlQ29tbWVudDMyNjEzODQzMQ==,2443309,2017-08-30T22:36:14Z,2017-08-30T22:36:14Z,MEMBER,"@tbohn - What is happening here is that xarray is storing the netCDF4 chunk size from the input file. For the `LAI` variable in your example, that is`LAI:_ChunkSizes = 19, 1, 160, 160 ;` (you can see this with `ncdump -h -s filename.nc`).
```shell
$ ncdump -s -h veg_hist.0_10n.90_80w.2000_2016.mode_PFT.5dates.nc
netcdf veg_hist.0_10n.90_80w.2000_2016.mode_PFT.5dates {
dimensions:
veg_class = 19 ;
lat = 160 ;
lon = 160 ;
time = UNLIMITED ; // (5 currently)
variables:
float Cv(veg_class, lat, lon) ;
Cv:_FillValue = -1.f ;
Cv:units = ""-"" ;
Cv:longname = ""Area Fraction"" ;
Cv:missing_value = -1.f ;
Cv:_Storage = ""contiguous"" ;
Cv:_Endianness = ""little"" ;
float LAI(veg_class, time, lat, lon) ;
LAI:_FillValue = -1.f ;
LAI:units = ""m2/m2"" ;
LAI:longname = ""Leaf Area Index"" ;
LAI:missing_value = -1.f ;
LAI:_Storage = ""chunked"" ;
LAI:_ChunkSizes = 19, 1, 160, 160 ;
LAI:_Endianness = ""little"" ;
...
```
Those integers correspond to the dimensions from LAI. When you slice your dataset, you end up with lat/lon dimensions that are now smaller than the `_ChunkSizes`. When writing this back to netCDF, xarray is still trying to use the original `encoding` attribute.
The logical fix is to validate this encoding attribute and either 1) throw an informative error if something isn't going to work, or 2) change the `ChunkSizes`.","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,202964277
https://github.com/pydata/xarray/issues/1225#issuecomment-307524160,https://api.github.com/repos/pydata/xarray/issues/1225,307524160,MDEyOklzc3VlQ29tbWVudDMwNzUyNDE2MA==,3496314,2017-06-09T23:32:38Z,2017-08-30T22:26:44Z,NONE,"OK, here's my code and the file that it works (fails) on.
Code:
```Python
import os.path
import numpy as np
import xarray as xr
ds = xr.open_dataset('veg_hist.0_10n.90_80w.2000_2016.mode_PFT.5dates.nc')
ds_out = ds.isel(lat=slice(0,16),lon=slice(0,16))
#ds_out.encoding['unlimited_dims'] = 'time'
ds_out.to_netcdf('test.out.nc')
```
Note that I commented out the attempt to make 'time' unlimited - if I attempt it, I get a slightly different chunk size error ('NetCDF: Bad chunk sizes').
I realize that for now I can use 'ncks' as a workaround, but seems to me that xarray should be able to do this too.
File (attached)
[veg_hist.0_10n.90_80w.2000_2016.mode_PFT.5dates.nc.zip](https://github.com/pydata/xarray/files/1065436/veg_hist.0_10n.90_80w.2000_2016.mode_PFT.5dates.nc.zip)
","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,202964277
https://github.com/pydata/xarray/issues/1225#issuecomment-307524406,https://api.github.com/repos/pydata/xarray/issues/1225,307524406,MDEyOklzc3VlQ29tbWVudDMwNzUyNDQwNg==,3496314,2017-06-09T23:34:44Z,2017-06-09T23:34:44Z,NONE,"(note also that for the example nc file I provided, the slice that my example code makes contains nothing but null values - but that's irrelevant - the error happens for other slices that do contain non-null values.)","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,202964277
https://github.com/pydata/xarray/issues/1225#issuecomment-307519054,https://api.github.com/repos/pydata/xarray/issues/1225,307519054,MDEyOklzc3VlQ29tbWVudDMwNzUxOTA1NA==,1217238,2017-06-09T23:02:20Z,2017-06-09T23:02:20Z,MEMBER,"@tbohn ""self-contained"" just means something that I can run on my machine. For example, the code above plus the ""somefile.nc"" netCDF file that I can load to reproduce this example.
Thinking about this a little more, I think the issue is somehow related to the `encoding['chunksizes']` property on the Dataset variables loaded from the original netCDF file. Something like this should work as a work-around:
```
del myds.var.encoding['chunksizes']
```
The bug is somewhere in our [handling of chunksize encoding](https://github.com/pydata/xarray/blob/bbeab6954c4bf06145c64bf90fbb268fce2ab7f1/xarray/backends/netCDF4_.py#L160) for netCDF4, but it is difficult to fix it without being able to run code that reproduces it.","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,202964277
https://github.com/pydata/xarray/issues/1225#issuecomment-307518173,https://api.github.com/repos/pydata/xarray/issues/1225,307518173,MDEyOklzc3VlQ29tbWVudDMwNzUxODE3Mw==,3496314,2017-06-09T22:55:20Z,2017-06-09T22:55:20Z,NONE,"I've been encountering this as well, and I don't want to use the scipy engine workaround. If you can tell me what a ""self-contained"" example means, I can also try to provide one.","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,202964277
https://github.com/pydata/xarray/issues/1225#issuecomment-306620537,https://api.github.com/repos/pydata/xarray/issues/1225,306620537,MDEyOklzc3VlQ29tbWVudDMwNjYyMDUzNw==,6101444,2017-06-06T21:19:21Z,2017-06-06T21:19:21Z,NONE,I've also just encountered this. Will try to to reproduce a self-contained example. ,"{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,202964277