home / github

Menu
  • GraphQL API
  • Search all tables

issue_comments

Table actions
  • GraphQL API for issue_comments

7 rows where issue = 257400162 sorted by updated_at descending

✎ View and edit SQL

This data as json, CSV (advanced)

Suggested facets: created_at (date), updated_at (date)

user 4

  • jamesstidard 3
  • jhamman 2
  • shoyer 1
  • fmaussion 1

author_association 2

  • MEMBER 4
  • NONE 3

issue 1

  • Modifying data set resulting in much larger file size · 7 ✖
id html_url issue_url node_id user created_at updated_at ▲ author_association body reactions performed_via_github_app issue
330162706 https://github.com/pydata/xarray/issues/1572#issuecomment-330162706 https://api.github.com/repos/pydata/xarray/issues/1572 MDEyOklzc3VlQ29tbWVudDMzMDE2MjcwNg== jamesstidard 1797906 2017-09-18T08:57:39Z 2017-09-18T08:59:24Z NONE

@shoyer great, thanks. I added the line below and it has reduced the size of the file down to that of the duplicate. Thanks pointing me the in the right direction. I'm assuming I do not need to fillnans with _FillValue after (though maybe I might).

python masked_ds.swh.encoding = {k: v for k, v in ds.swh.encoding.items() if k in {'_FillValue', 'add_offset', 'dtype', 'scale_factor'}}

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Modifying data set resulting in much larger file size 257400162
329286600 https://github.com/pydata/xarray/issues/1572#issuecomment-329286600 https://api.github.com/repos/pydata/xarray/issues/1572 MDEyOklzc3VlQ29tbWVudDMyOTI4NjYwMA== shoyer 1217238 2017-09-13T20:25:33Z 2017-09-13T20:25:33Z MEMBER

You could do scale-offset encoding on the variable by setting _FillValue, scale_factor and add_offset encoding parameters to appropriate values, which you could simply copy from the original: http://xarray.pydata.org/en/latest/io.html#scaling-and-type-conversions

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Modifying data set resulting in much larger file size 257400162
329233581 https://github.com/pydata/xarray/issues/1572#issuecomment-329233581 https://api.github.com/repos/pydata/xarray/issues/1572 MDEyOklzc3VlQ29tbWVudDMyOTIzMzU4MQ== jamesstidard 1797906 2017-09-13T17:06:12Z 2017-09-13T17:06:12Z NONE

@fmaussion @jhamman Ah great - that makes sense. I'll see if I can set them to the original file's short fill representation instead of nan.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Modifying data set resulting in much larger file size 257400162
329232225 https://github.com/pydata/xarray/issues/1572#issuecomment-329232225 https://api.github.com/repos/pydata/xarray/issues/1572 MDEyOklzc3VlQ29tbWVudDMyOTIzMjIyNQ== fmaussion 10050469 2017-09-13T17:01:09Z 2017-09-13T17:04:12Z MEMBER

Yes, your file uses lossy compression, which is lost in the conversion to the type double.

You can either use lossy compression again, or store your data as float instead of double to reduce the output file size. (http://xarray.pydata.org/en/latest/io.html#writing-encoded-data)

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Modifying data set resulting in much larger file size 257400162
329232732 https://github.com/pydata/xarray/issues/1572#issuecomment-329232732 https://api.github.com/repos/pydata/xarray/issues/1572 MDEyOklzc3VlQ29tbWVudDMyOTIzMjczMg== jhamman 2443309 2017-09-13T17:02:57Z 2017-09-13T17:02:57Z MEMBER

Thanks. So, as you can see, the swh variable was promoted from a short to a double which is why your dataset has increased in size. The current version of where inserts NaNs in place of fill values but these cannot be represented as a short.

In the next version of xarray (0.10) we will have an improved version of where that will help with some of this. @fmaussion also has some good suggestions.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Modifying data set resulting in much larger file size 257400162
329230620 https://github.com/pydata/xarray/issues/1572#issuecomment-329230620 https://api.github.com/repos/pydata/xarray/issues/1572 MDEyOklzc3VlQ29tbWVudDMyOTIzMDYyMA== jamesstidard 1797906 2017-09-13T16:55:45Z 2017-09-13T16:59:57Z NONE

Sure, here you go:

Original (128.9MB): ```bash $ ncdump -h -s swh_2010_01_05_05.nc netcdf swh_2010_01_05_05 { dimensions: longitude = 720 ; latitude = 361 ; time = UNLIMITED ; // (248 currently) variables: float longitude(longitude) ; longitude:units = "degrees_east" ; longitude:long_name = "longitude" ; float latitude(latitude) ; latitude:units = "degrees_north" ; latitude:long_name = "latitude" ; int time(time) ; time:units = "hours since 1900-01-01 00:00:0.0" ; time:long_name = "time" ; time:calendar = "gregorian" ; short swh(time, latitude, longitude) ; swh:scale_factor = 0.000203558072860934 ; swh:add_offset = 6.70098898894319 ; swh:_FillValue = -32767s ; swh:missing_value = -32767s ; swh:units = "m" ; swh:long_name = "Significant height of combined wind waves and swell" ;

// global attributes: :Conventions = "CF-1.6" ; :history = "2017-08-09 16:41:57 GMT by grib_to_netcdf-2.4.0: grib_to_netcdf /data/data04/scratch/_mars-atls01-a562cefde8a29a7288fa0b8b7f9413f7-5gV0xP.grib -o /data/data05/scratch/_grib2netcdf-atls09-70e05f9f8ba4e9d19932f1c45a7be8d8-jU8lEi.nc -utime" ; :_Format = "64-bit offset" ; } Duplicate (129.0MB):bash $ ncdump -h -s swh_2010_01_05_05-duplicate.nc netcdf swh_2010_01_05_05-duplicate { dimensions: longitude = 720 ; latitude = 361 ; time = UNLIMITED ; // (248 currently) variables: float longitude(longitude) ; longitude:_FillValue = NaNf ; longitude:units = "degrees_east" ; longitude:long_name = "longitude" ; longitude:_Storage = "contiguous" ; float latitude(latitude) ; latitude:_FillValue = NaNf ; latitude:units = "degrees_north" ; latitude:long_name = "latitude" ; latitude:_Storage = "contiguous" ; int time(time) ; time:long_name = "time" ; time:units = "hours since 1900-01-01" ; time:calendar = "gregorian" ; time:_Storage = "chunked" ; time:_ChunkSizes = 1024 ; time:_Endianness = "little" ; short swh(time, latitude, longitude) ; swh:_FillValue = -32767s ; swh:units = "m" ; swh:long_name = "Significant height of combined wind waves and swell" ; swh:add_offset = 6.70098898894319 ; swh:scale_factor = 0.000203558072860934 ; swh:_Storage = "chunked" ; swh:_ChunkSizes = 1, 361, 720 ; swh:_Endianness = "little" ;

// global attributes: :_NCProperties = "version=1|netcdflibversion=4.4.1.1|hdf5libversion=1.8.18" ; :Conventions = "CF-1.6" ; :history = "2017-08-09 16:41:57 GMT by grib_to_netcdf-2.4.0: grib_to_netcdf /data/data04/scratch/_mars-atls01-a562cefde8a29a7288fa0b8b7f9413f7-5gV0xP.grib -o /data/data05/scratch/_grib2netcdf-atls09-70e05f9f8ba4e9d19932f1c45a7be8d8-jU8lEi.nc -utime" ; :_Format = "netCDF-4" ; } Masked (515.7MB):bash $ ncdump -h -s swh_2010_01_05_05-masked.nc netcdf swh_2010_01_05_05-masked { dimensions: longitude = 720 ; latitude = 361 ; time = 248 ; variables: float longitude(longitude) ; longitude:_FillValue = NaNf ; longitude:units = "degrees_east" ; longitude:long_name = "longitude" ; longitude:_Storage = "contiguous" ; float latitude(latitude) ; latitude:_FillValue = NaNf ; latitude:units = "degrees_north" ; latitude:long_name = "latitude" ; latitude:_Storage = "contiguous" ; int time(time) ; time:long_name = "time" ; time:units = "hours since 1900-01-01" ; time:calendar = "gregorian" ; time:_Storage = "contiguous" ; time:_Endianness = "little" ; double swh(time, latitude, longitude) ; swh:_FillValue = NaN ; swh:units = "m" ; swh:long_name = "Significant height of combined wind waves and swell" ; swh:_Storage = "contiguous" ;

// global attributes: :_NCProperties = "version=1|netcdflibversion=4.4.1.1|hdf5libversion=1.8.18" ; :Conventions = "CF-1.6" ; :history = "2017-08-09 16:41:57 GMT by grib_to_netcdf-2.4.0: grib_to_netcdf /data/data04/scratch/_mars-atls01-a562cefde8a29a7288fa0b8b7f9413f7-5gV0xP.grib -o /data/data05/scratch/_grib2netcdf-atls09-70e05f9f8ba4e9d19932f1c45a7be8d8-jU8lEi.nc -utime" ; :_Format = "netCDF-4" ; } ``` I assume it's about that fill/missing value changing? Thanks for the help.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Modifying data set resulting in much larger file size 257400162
329228614 https://github.com/pydata/xarray/issues/1572#issuecomment-329228614 https://api.github.com/repos/pydata/xarray/issues/1572 MDEyOklzc3VlQ29tbWVudDMyOTIyODYxNA== jhamman 2443309 2017-09-13T16:48:35Z 2017-09-13T16:48:35Z MEMBER

@jamesstidard - can you compare the output of ncdump -h -s your_file.nc for these three datasets and report back?

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Modifying data set resulting in much larger file size 257400162

Advanced export

JSON shape: default, array, newline-delimited, object

CSV options:

CREATE TABLE [issue_comments] (
   [html_url] TEXT,
   [issue_url] TEXT,
   [id] INTEGER PRIMARY KEY,
   [node_id] TEXT,
   [user] INTEGER REFERENCES [users]([id]),
   [created_at] TEXT,
   [updated_at] TEXT,
   [author_association] TEXT,
   [body] TEXT,
   [reactions] TEXT,
   [performed_via_github_app] TEXT,
   [issue] INTEGER REFERENCES [issues]([id])
);
CREATE INDEX [idx_issue_comments_issue]
    ON [issue_comments] ([issue]);
CREATE INDEX [idx_issue_comments_user]
    ON [issue_comments] ([user]);
Powered by Datasette · Queries took 4399.881ms · About: xarray-datasette