html_url,issue_url,id,node_id,user,created_at,updated_at,author_association,body,reactions,performed_via_github_app,issue https://github.com/pydata/xarray/issues/1572#issuecomment-330162706,https://api.github.com/repos/pydata/xarray/issues/1572,330162706,MDEyOklzc3VlQ29tbWVudDMzMDE2MjcwNg==,1797906,2017-09-18T08:57:39Z,2017-09-18T08:59:24Z,NONE,"@shoyer great, thanks. I added the line below and it has reduced the size of the file down to that of the duplicate. Thanks pointing me the in the right direction. I'm assuming I do not need to fillnans with _FillValue after (though maybe I might). ```python masked_ds.swh.encoding = {k: v for k, v in ds.swh.encoding.items() if k in {'_FillValue', 'add_offset', 'dtype', 'scale_factor'}} ```","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,257400162 https://github.com/pydata/xarray/issues/1572#issuecomment-329286600,https://api.github.com/repos/pydata/xarray/issues/1572,329286600,MDEyOklzc3VlQ29tbWVudDMyOTI4NjYwMA==,1217238,2017-09-13T20:25:33Z,2017-09-13T20:25:33Z,MEMBER,"You could do scale-offset encoding on the variable by setting `_FillValue`, `scale_factor` and `add_offset` encoding parameters to appropriate values, which you could simply copy from the original: http://xarray.pydata.org/en/latest/io.html#scaling-and-type-conversions","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,257400162 https://github.com/pydata/xarray/issues/1572#issuecomment-329233581,https://api.github.com/repos/pydata/xarray/issues/1572,329233581,MDEyOklzc3VlQ29tbWVudDMyOTIzMzU4MQ==,1797906,2017-09-13T17:06:12Z,2017-09-13T17:06:12Z,NONE,@fmaussion @jhamman Ah great - that makes sense. I'll see if I can set them to the original file's short fill representation instead of nan.,"{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,257400162 https://github.com/pydata/xarray/issues/1572#issuecomment-329232225,https://api.github.com/repos/pydata/xarray/issues/1572,329232225,MDEyOklzc3VlQ29tbWVudDMyOTIzMjIyNQ==,10050469,2017-09-13T17:01:09Z,2017-09-13T17:04:12Z,MEMBER,"Yes, your file uses lossy compression, which is lost in the conversion to the type double. You can either use lossy compression again, or store your data as float instead of double to reduce the output file size. (http://xarray.pydata.org/en/latest/io.html#writing-encoded-data)","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,257400162 https://github.com/pydata/xarray/issues/1572#issuecomment-329232732,https://api.github.com/repos/pydata/xarray/issues/1572,329232732,MDEyOklzc3VlQ29tbWVudDMyOTIzMjczMg==,2443309,2017-09-13T17:02:57Z,2017-09-13T17:02:57Z,MEMBER,"Thanks. So, as you can see, the `swh` variable was promoted from a `short` to a `double` which is why your dataset has increased in size. The current version of `where` inserts NaNs in place of fill values but these cannot be represented as a `short`. In the next version of xarray (0.10) we will have an improved version of where that will help with some of this. @fmaussion also has some good suggestions. ","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,257400162 https://github.com/pydata/xarray/issues/1572#issuecomment-329230620,https://api.github.com/repos/pydata/xarray/issues/1572,329230620,MDEyOklzc3VlQ29tbWVudDMyOTIzMDYyMA==,1797906,2017-09-13T16:55:45Z,2017-09-13T16:59:57Z,NONE,"Sure, here you go: Original (128.9MB): ```bash $ ncdump -h -s swh_2010_01_05_05.nc netcdf swh_2010_01_05_05 { dimensions: longitude = 720 ; latitude = 361 ; time = UNLIMITED ; // (248 currently) variables: float longitude(longitude) ; longitude:units = ""degrees_east"" ; longitude:long_name = ""longitude"" ; float latitude(latitude) ; latitude:units = ""degrees_north"" ; latitude:long_name = ""latitude"" ; int time(time) ; time:units = ""hours since 1900-01-01 00:00:0.0"" ; time:long_name = ""time"" ; time:calendar = ""gregorian"" ; short swh(time, latitude, longitude) ; swh:scale_factor = 0.000203558072860934 ; swh:add_offset = 6.70098898894319 ; swh:_FillValue = -32767s ; swh:missing_value = -32767s ; swh:units = ""m"" ; swh:long_name = ""Significant height of combined wind waves and swell"" ; // global attributes: :Conventions = ""CF-1.6"" ; :history = ""2017-08-09 16:41:57 GMT by grib_to_netcdf-2.4.0: grib_to_netcdf /data/data04/scratch/_mars-atls01-a562cefde8a29a7288fa0b8b7f9413f7-5gV0xP.grib -o /data/data05/scratch/_grib2netcdf-atls09-70e05f9f8ba4e9d19932f1c45a7be8d8-jU8lEi.nc -utime"" ; :_Format = ""64-bit offset"" ; } ``` Duplicate (129.0MB): ```bash $ ncdump -h -s swh_2010_01_05_05-duplicate.nc netcdf swh_2010_01_05_05-duplicate { dimensions: longitude = 720 ; latitude = 361 ; time = UNLIMITED ; // (248 currently) variables: float longitude(longitude) ; longitude:_FillValue = NaNf ; longitude:units = ""degrees_east"" ; longitude:long_name = ""longitude"" ; longitude:_Storage = ""contiguous"" ; float latitude(latitude) ; latitude:_FillValue = NaNf ; latitude:units = ""degrees_north"" ; latitude:long_name = ""latitude"" ; latitude:_Storage = ""contiguous"" ; int time(time) ; time:long_name = ""time"" ; time:units = ""hours since 1900-01-01"" ; time:calendar = ""gregorian"" ; time:_Storage = ""chunked"" ; time:_ChunkSizes = 1024 ; time:_Endianness = ""little"" ; short swh(time, latitude, longitude) ; swh:_FillValue = -32767s ; swh:units = ""m"" ; swh:long_name = ""Significant height of combined wind waves and swell"" ; swh:add_offset = 6.70098898894319 ; swh:scale_factor = 0.000203558072860934 ; swh:_Storage = ""chunked"" ; swh:_ChunkSizes = 1, 361, 720 ; swh:_Endianness = ""little"" ; // global attributes: :_NCProperties = ""version=1|netcdflibversion=4.4.1.1|hdf5libversion=1.8.18"" ; :Conventions = ""CF-1.6"" ; :history = ""2017-08-09 16:41:57 GMT by grib_to_netcdf-2.4.0: grib_to_netcdf /data/data04/scratch/_mars-atls01-a562cefde8a29a7288fa0b8b7f9413f7-5gV0xP.grib -o /data/data05/scratch/_grib2netcdf-atls09-70e05f9f8ba4e9d19932f1c45a7be8d8-jU8lEi.nc -utime"" ; :_Format = ""netCDF-4"" ; } ``` Masked (515.7MB): ```bash $ ncdump -h -s swh_2010_01_05_05-masked.nc netcdf swh_2010_01_05_05-masked { dimensions: longitude = 720 ; latitude = 361 ; time = 248 ; variables: float longitude(longitude) ; longitude:_FillValue = NaNf ; longitude:units = ""degrees_east"" ; longitude:long_name = ""longitude"" ; longitude:_Storage = ""contiguous"" ; float latitude(latitude) ; latitude:_FillValue = NaNf ; latitude:units = ""degrees_north"" ; latitude:long_name = ""latitude"" ; latitude:_Storage = ""contiguous"" ; int time(time) ; time:long_name = ""time"" ; time:units = ""hours since 1900-01-01"" ; time:calendar = ""gregorian"" ; time:_Storage = ""contiguous"" ; time:_Endianness = ""little"" ; double swh(time, latitude, longitude) ; swh:_FillValue = NaN ; swh:units = ""m"" ; swh:long_name = ""Significant height of combined wind waves and swell"" ; swh:_Storage = ""contiguous"" ; // global attributes: :_NCProperties = ""version=1|netcdflibversion=4.4.1.1|hdf5libversion=1.8.18"" ; :Conventions = ""CF-1.6"" ; :history = ""2017-08-09 16:41:57 GMT by grib_to_netcdf-2.4.0: grib_to_netcdf /data/data04/scratch/_mars-atls01-a562cefde8a29a7288fa0b8b7f9413f7-5gV0xP.grib -o /data/data05/scratch/_grib2netcdf-atls09-70e05f9f8ba4e9d19932f1c45a7be8d8-jU8lEi.nc -utime"" ; :_Format = ""netCDF-4"" ; } ``` I assume it's about that fill/missing value changing? Thanks for the help.","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,257400162 https://github.com/pydata/xarray/issues/1572#issuecomment-329228614,https://api.github.com/repos/pydata/xarray/issues/1572,329228614,MDEyOklzc3VlQ29tbWVudDMyOTIyODYxNA==,2443309,2017-09-13T16:48:35Z,2017-09-13T16:48:35Z,MEMBER,@jamesstidard - can you compare the output of `ncdump -h -s your_file.nc` for these three datasets and report back?,"{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,257400162