github: issue_comments: 7 rows where issue = 257400162 sorted by updated

7 rows where issue = 257400162 sorted by updated_at descending

Search:

descending

id	html_url	issue_url	node_id	user	created_at	updated_at ▲	author_association	body	reactions	issue
330162706	https://github.com/pydata/xarray/issues/1572#issuecomment-330162706	https://api.github.com/repos/pydata/xarray/issues/1572	MDEyOklzc3VlQ29tbWVudDMzMDE2MjcwNg==	jamesstidard 1797906	2017-09-18T08:57:39Z	2017-09-18T08:59:24Z	NONE	@shoyer great, thanks. I added the line below and it has reduced the size of the file down to that of the duplicate. Thanks pointing me the in the right direction. I'm assuming I do not need to fillnans with _FillValue after (though maybe I might). `python masked_ds.swh.encoding = {k: v for k, v in ds.swh.encoding.items() if k in {'_FillValue', 'add_offset', 'dtype', 'scale_factor'}}`	{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	Modifying data set resulting in much larger file size 257400162
329286600	https://github.com/pydata/xarray/issues/1572#issuecomment-329286600	https://api.github.com/repos/pydata/xarray/issues/1572	MDEyOklzc3VlQ29tbWVudDMyOTI4NjYwMA==	shoyer 1217238	2017-09-13T20:25:33Z	2017-09-13T20:25:33Z	MEMBER	You could do scale-offset encoding on the variable by setting `_FillValue`, `scale_factor` and `add_offset` encoding parameters to appropriate values, which you could simply copy from the original: http://xarray.pydata.org/en/latest/io.html#scaling-and-type-conversions	{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	Modifying data set resulting in much larger file size 257400162
329233581	https://github.com/pydata/xarray/issues/1572#issuecomment-329233581	https://api.github.com/repos/pydata/xarray/issues/1572	MDEyOklzc3VlQ29tbWVudDMyOTIzMzU4MQ==	jamesstidard 1797906	2017-09-13T17:06:12Z	2017-09-13T17:06:12Z	NONE	@fmaussion @jhamman Ah great - that makes sense. I'll see if I can set them to the original file's short fill representation instead of nan.	{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	Modifying data set resulting in much larger file size 257400162
329232225	https://github.com/pydata/xarray/issues/1572#issuecomment-329232225	https://api.github.com/repos/pydata/xarray/issues/1572	MDEyOklzc3VlQ29tbWVudDMyOTIzMjIyNQ==	fmaussion 10050469	2017-09-13T17:01:09Z	2017-09-13T17:04:12Z	MEMBER	Yes, your file uses lossy compression, which is lost in the conversion to the type double. You can either use lossy compression again, or store your data as float instead of double to reduce the output file size. (http://xarray.pydata.org/en/latest/io.html#writing-encoded-data)	{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	Modifying data set resulting in much larger file size 257400162
329232732	https://github.com/pydata/xarray/issues/1572#issuecomment-329232732	https://api.github.com/repos/pydata/xarray/issues/1572	MDEyOklzc3VlQ29tbWVudDMyOTIzMjczMg==	jhamman 2443309	2017-09-13T17:02:57Z	2017-09-13T17:02:57Z	MEMBER	Thanks. So, as you can see, the `swh` variable was promoted from a `short` to a `double` which is why your dataset has increased in size. The current version of `where` inserts NaNs in place of fill values but these cannot be represented as a `short`. In the next version of xarray (0.10) we will have an improved version of where that will help with some of this. @fmaussion also has some good suggestions.	{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	Modifying data set resulting in much larger file size 257400162
329230620	https://github.com/pydata/xarray/issues/1572#issuecomment-329230620	https://api.github.com/repos/pydata/xarray/issues/1572	MDEyOklzc3VlQ29tbWVudDMyOTIzMDYyMA==	jamesstidard 1797906	2017-09-13T16:55:45Z	2017-09-13T16:59:57Z	NONE	Sure, here you go: Original (128.9MB): ```bash $ ncdump -h -s swh_2010_01_05_05.nc netcdf swh_2010_01_05_05 { dimensions: longitude = 720 ; latitude = 361 ; time = UNLIMITED ; // (248 currently) variables: float longitude(longitude) ; longitude:units = "degrees_east" ; longitude:long_name = "longitude" ; float latitude(latitude) ; latitude:units = "degrees_north" ; latitude:long_name = "latitude" ; int time(time) ; time:units = "hours since 1900-01-01 00:00:0.0" ; time:long_name = "time" ; time:calendar = "gregorian" ; short swh(time, latitude, longitude) ; swh:scale_factor = 0.000203558072860934 ; swh:add_offset = 6.70098898894319 ; swh:_FillValue = -32767s ; swh:missing_value = -32767s ; swh:units = "m" ; swh:long_name = "Significant height of combined wind waves and swell" ; // global attributes: :Conventions = "CF-1.6" ; :history = "2017-08-09 16:41:57 GMT by grib_to_netcdf-2.4.0: grib_to_netcdf /data/data04/scratch/_mars-atls01-a562cefde8a29a7288fa0b8b7f9413f7-5gV0xP.grib -o /data/data05/scratch/_grib2netcdf-atls09-70e05f9f8ba4e9d19932f1c45a7be8d8-jU8lEi.nc -utime" ; :_Format = "64-bit offset" ; } `Duplicate (129.0MB):`bash $ ncdump -h -s swh_2010_01_05_05-duplicate.nc netcdf swh_2010_01_05_05-duplicate { dimensions: longitude = 720 ; latitude = 361 ; time = UNLIMITED ; // (248 currently) variables: float longitude(longitude) ; longitude:_FillValue = NaNf ; longitude:units = "degrees_east" ; longitude:long_name = "longitude" ; longitude:_Storage = "contiguous" ; float latitude(latitude) ; latitude:_FillValue = NaNf ; latitude:units = "degrees_north" ; latitude:long_name = "latitude" ; latitude:_Storage = "contiguous" ; int time(time) ; time:long_name = "time" ; time:units = "hours since 1900-01-01" ; time:calendar = "gregorian" ; time:_Storage = "chunked" ; time:_ChunkSizes = 1024 ; time:_Endianness = "little" ; short swh(time, latitude, longitude) ; swh:_FillValue = -32767s ; swh:units = "m" ; swh:long_name = "Significant height of combined wind waves and swell" ; swh:add_offset = 6.70098898894319 ; swh:scale_factor = 0.000203558072860934 ; swh:_Storage = "chunked" ; swh:_ChunkSizes = 1, 361, 720 ; swh:_Endianness = "little" ; // global attributes: :_NCProperties = "version=1\|netcdflibversion=4.4.1.1\|hdf5libversion=1.8.18" ; :Conventions = "CF-1.6" ; :history = "2017-08-09 16:41:57 GMT by grib_to_netcdf-2.4.0: grib_to_netcdf /data/data04/scratch/_mars-atls01-a562cefde8a29a7288fa0b8b7f9413f7-5gV0xP.grib -o /data/data05/scratch/_grib2netcdf-atls09-70e05f9f8ba4e9d19932f1c45a7be8d8-jU8lEi.nc -utime" ; :_Format = "netCDF-4" ; } `Masked (515.7MB):`bash $ ncdump -h -s swh_2010_01_05_05-masked.nc netcdf swh_2010_01_05_05-masked { dimensions: longitude = 720 ; latitude = 361 ; time = 248 ; variables: float longitude(longitude) ; longitude:_FillValue = NaNf ; longitude:units = "degrees_east" ; longitude:long_name = "longitude" ; longitude:_Storage = "contiguous" ; float latitude(latitude) ; latitude:_FillValue = NaNf ; latitude:units = "degrees_north" ; latitude:long_name = "latitude" ; latitude:_Storage = "contiguous" ; int time(time) ; time:long_name = "time" ; time:units = "hours since 1900-01-01" ; time:calendar = "gregorian" ; time:_Storage = "contiguous" ; time:_Endianness = "little" ; double swh(time, latitude, longitude) ; swh:_FillValue = NaN ; swh:units = "m" ; swh:long_name = "Significant height of combined wind waves and swell" ; swh:_Storage = "contiguous" ; // global attributes: :_NCProperties = "version=1\|netcdflibversion=4.4.1.1\|hdf5libversion=1.8.18" ; :Conventions = "CF-1.6" ; :history = "2017-08-09 16:41:57 GMT by grib_to_netcdf-2.4.0: grib_to_netcdf /data/data04/scratch/_mars-atls01-a562cefde8a29a7288fa0b8b7f9413f7-5gV0xP.grib -o /data/data05/scratch/_grib2netcdf-atls09-70e05f9f8ba4e9d19932f1c45a7be8d8-jU8lEi.nc -utime" ; :_Format = "netCDF-4" ; } ``` I assume it's about that fill/missing value changing? Thanks for the help.	{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	Modifying data set resulting in much larger file size 257400162
329228614	https://github.com/pydata/xarray/issues/1572#issuecomment-329228614	https://api.github.com/repos/pydata/xarray/issues/1572	MDEyOklzc3VlQ29tbWVudDMyOTIyODYxNA==	jhamman 2443309	2017-09-13T16:48:35Z	2017-09-13T16:48:35Z	MEMBER	@jamesstidard - can you compare the output of `ncdump -h -s your_file.nc` for these three datasets and report back?	{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	Modifying data set resulting in much larger file size 257400162

Advanced export

JSON shape: default, array, newline-delimited, object

CREATE TABLE [issue_comments] (
   [html_url] TEXT,
   [issue_url] TEXT,
   [id] INTEGER PRIMARY KEY,
   [node_id] TEXT,
   [user] INTEGER REFERENCES [users]([id]),
   [created_at] TEXT,
   [updated_at] TEXT,
   [author_association] TEXT,
   [body] TEXT,
   [reactions] TEXT,
   [performed_via_github_app] TEXT,
   [issue] INTEGER REFERENCES [issues]([id])
);
CREATE INDEX [idx_issue_comments_issue]
    ON [issue_comments] ([issue]);
CREATE INDEX [idx_issue_comments_user]
    ON [issue_comments] ([user]);