home / github

Menu
  • Search all tables
  • GraphQL API

issue_comments

Table actions
  • GraphQL API for issue_comments

11 rows where issue = 258500654 sorted by updated_at descending

✖
✖

✎ View and edit SQL

This data as json, CSV (advanced)

Suggested facets: reactions, created_at (date), updated_at (date)

user 5

  • forman 5
  • jhamman 2
  • fmaussion 2
  • shoyer 1
  • stale[bot] 1

author_association 2

  • NONE 6
  • MEMBER 5

issue 1

  • Variable of dtype int8 casted to float64 · 11 ✖
id html_url issue_url node_id user created_at updated_at ▲ author_association body reactions performed_via_github_app issue
699477482 https://github.com/pydata/xarray/issues/1576#issuecomment-699477482 https://api.github.com/repos/pydata/xarray/issues/1576 MDEyOklzc3VlQ29tbWVudDY5OTQ3NzQ4Mg== stale[bot] 26384082 2020-09-26T10:40:38Z 2020-09-26T10:40:38Z NONE

In order to maintain a list of currently relevant issues, we mark issues as stale after a period of inactivity

If this issue remains relevant, please comment here or remove the stale label; otherwise it will be marked as closed automatically

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Variable of dtype int8 casted to float64 258500654
330464740 https://github.com/pydata/xarray/issues/1576#issuecomment-330464740 https://api.github.com/repos/pydata/xarray/issues/1576 MDEyOklzc3VlQ29tbWVudDMzMDQ2NDc0MA== forman 206773 2017-09-19T08:16:43Z 2017-09-19T08:16:43Z NONE

@shoyer

We currently decode anything with a _FillValue attribute to float, ...

I believe this fact is surprising for any user of integer/index/enum/classification datasets. Since its justification seems to be an implementation detail which comes at the cost of increased memory and CPU consumption I suggest documenting it in open_dataset() and decode_cf() functions.

Here is how we overcome this issue by deleting the _FillValue attribute of integer variables if their scale_factor and add_offset attributes are not provided:

ds = xr.open_dataset(path, decode_cf=False)
old_fill_values = unset_fill_value_for_int_vars(ds)
ds = xr.decode_cf(ds)
reset_fill_value_for_int_vars(ds, old_fill_values)

where old_fill_values is a mapping of variable names to fill values.

{
    "total_count": 2,
    "+1": 2,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Variable of dtype int8 casted to float64 258500654
330277364 https://github.com/pydata/xarray/issues/1576#issuecomment-330277364 https://api.github.com/repos/pydata/xarray/issues/1576 MDEyOklzc3VlQ29tbWVudDMzMDI3NzM2NA== jhamman 2443309 2017-09-18T16:26:36Z 2017-09-18T16:26:36Z MEMBER

Why can't xarray used masked arrays, that would retain the original dtype?

We have an open issue for this topic (#1194). A lot of it comes down to performance, dask is part of that but the other issue is that masked arrays in numpy are quite slow.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Variable of dtype int8 casted to float64 258500654
330275698 https://github.com/pydata/xarray/issues/1576#issuecomment-330275698 https://api.github.com/repos/pydata/xarray/issues/1576 MDEyOklzc3VlQ29tbWVudDMzMDI3NTY5OA== forman 206773 2017-09-18T16:20:33Z 2017-09-18T16:20:33Z NONE

@jhamman _NoFill is about optimizing writes, see nc_set_fill

{
    "total_count": 1,
    "+1": 1,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Variable of dtype int8 casted to float64 258500654
330273842 https://github.com/pydata/xarray/issues/1576#issuecomment-330273842 https://api.github.com/repos/pydata/xarray/issues/1576 MDEyOklzc3VlQ29tbWVudDMzMDI3Mzg0Mg== forman 206773 2017-09-18T16:13:45Z 2017-09-18T16:13:45Z NONE

I see, that is what is done in mask_and_scale(). Why can't xarray used masked arrays, that would retain the original dtype? (Dask, I guess?) Expanding integers to 8 byte floats not only cost memory but also CPU, including an inaccurate in-memory integer representation.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Variable of dtype int8 casted to float64 258500654
330271312 https://github.com/pydata/xarray/issues/1576#issuecomment-330271312 https://api.github.com/repos/pydata/xarray/issues/1576 MDEyOklzc3VlQ29tbWVudDMzMDI3MTMxMg== shoyer 1217238 2017-09-18T16:04:47Z 2017-09-18T16:04:47Z MEMBER

We currently decode anything with a _FillValue attribute to float, so that we can convert any values equal to the fill value to NaN. This ensure's that xarray's NaN skipping aggregations (e.g., mean()) work properly.

However, this isn't really a useful thing to do for a dataset like this where the values really represent enums/categories. It seems like the CF compliant way to indicate this is with the various flag_* attributes. So we could look for those to indicate that we shouldn't fill-in fill values.

Eventually, we could possibly also use this for decoding into a true "categorical" dtype, but numpy doesn't have anything like that yet.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Variable of dtype int8 casted to float64 258500654
330271058 https://github.com/pydata/xarray/issues/1576#issuecomment-330271058 https://api.github.com/repos/pydata/xarray/issues/1576 MDEyOklzc3VlQ29tbWVudDMzMDI3MTA1OA== jhamman 2443309 2017-09-18T16:03:49Z 2017-09-18T16:03:49Z MEMBER

Right, since xarray uses np.nan as its fill value, any array with a _FillValue will be promoted to a float dtype.

Out of curiosity, what is the meaning _NoFill = "true"?

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Variable of dtype int8 casted to float64 258500654
330267397 https://github.com/pydata/xarray/issues/1576#issuecomment-330267397 https://api.github.com/repos/pydata/xarray/issues/1576 MDEyOklzc3VlQ29tbWVudDMzMDI2NzM5Nw== forman 206773 2017-09-18T15:52:55Z 2017-09-18T16:00:01Z NONE

I guess, the poblem is caused in xarray/conventions.py.

Note, when debugging into it, fill_value == nd.array([0], dtype == np.int8) and fill_value.dtype.kind='i' and the latter kind is not dealt with. Therefore int8 is turned into float64.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Variable of dtype int8 casted to float64 258500654
330263190 https://github.com/pydata/xarray/issues/1576#issuecomment-330263190 https://api.github.com/repos/pydata/xarray/issues/1576 MDEyOklzc3VlQ29tbWVudDMzMDI2MzE5MA== fmaussion 10050469 2017-09-18T15:38:49Z 2017-09-18T15:38:49Z MEMBER

OK. I'll let @shoyer comment on the substance but indeed it seems that decode_cf could be cleverer here. It should be an easy fix.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Variable of dtype int8 casted to float64 258500654
330261323 https://github.com/pydata/xarray/issues/1576#issuecomment-330261323 https://api.github.com/repos/pydata/xarray/issues/1576 MDEyOklzc3VlQ29tbWVudDMzMDI2MTMyMw== forman 206773 2017-09-18T15:32:27Z 2017-09-18T15:32:27Z NONE

Here you are

$ ncdump -h -s
netcdf ESACCI-LC-L4-LCCS-Map-300m-P5Y-2005-v1.6.1 {
dimensions:
        lat = 64800 ;
        lon = 129600 ;
variables:
        byte lccs_class(lat, lon) ;
                lccs_class:long_name = "Land cover class defined in LCCS" ;
                lccs_class:standard_name = "land_cover_lccs" ;
                lccs_class:flag_values = 0b, 10b, 11b, 12b, 20b, 30b, 40b, 50b, 60b, 61b, 62b, 70b, 71b, 72b, 80b, 81b, 82b, 90b, 100b, 110b, 120b, 121b, 122b, -126b, -116b, -106b, -104b, -103b, -96b, -86b, -76b, -66b, -56b, -55b, -54b, -46b,
-36b ;
                lccs_class:flag_meanings = "no_data cropland_rainfed cropland_rainfed_herbaceous_cover cropland_rainfed_tree_or_shrub_cover cropland_irrigated mosaic_cropland mosaic_natural_vegetation tree_broadleaved_evergreen_closed_to_open
tree_broadleaved_deciduous_closed_to_open tree_broadleaved_deciduous_closed tree_broadleaved_deciduous_open tree_needleleaved_evergreen_closed_to_open tree_needleleaved_evergreen_closed tree_needleleaved_evergreen_open tree_needleleaved_decidu
ous_closed_to_open tree_needleleaved_deciduous_closed tree_needleleaved_deciduous_open tree_mixed mosaic_tree_and_shrub mosaic_herbaceous shrubland shrubland_evergreen shrubland_deciduous grassland lichens_and_mosses sparse_vegetation sparse_s
hrub sparse_herbaceous tree_cover_flooded_fresh_or_brakish_water tree_cover_flooded_saline_water shrub_or_herbaceous_cover_flooded urban bare_areas bare_areas_consolidated bare_areas_unconsolidated water snow_and_ice" ;
                lccs_class:valid_min = 1 ;
                lccs_class:valid_max = 220 ;
                lccs_class:_Unsigned = "true" ;
                lccs_class:_FillValue = 0b ;
                lccs_class:ancillary_variables = "processed_flag current_pixel_state observation_count algorithmic_confidence_level" ;
                lccs_class:_Storage = "chunked" ;
                lccs_class:_ChunkSizes = 2048, 2048 ;
                lccs_class:_DeflateLevel = 6 ;
                lccs_class:_NoFill = "true" ;
        byte processed_flag(lat, lon) ;
                processed_flag:standard_name = "land_cover_lccs status_flag" ;
                processed_flag:flag_values = 0b, 1b ;
                processed_flag:flag_meanings = "not_processed processed" ;
                processed_flag:valid_min = 0 ;
                processed_flag:valid_max = 1 ;
                processed_flag:_FillValue = -1b ;
                processed_flag:long_name = "LC map processed area flag" ;
                processed_flag:_Storage = "chunked" ;
                processed_flag:_ChunkSizes = 2048, 2048 ;
                processed_flag:_DeflateLevel = 6 ;
                processed_flag:_NoFill = "true" ;
        byte current_pixel_state(lat, lon) ;
                current_pixel_state:standard_name = "land_cover_lccs status_flag" ;
                current_pixel_state:flag_values = 0b, 1b, 2b, 3b, 4b, 5b ;
                current_pixel_state:flag_meanings = "invalid clear_land clear_water clear_snow_ice cloud cloud_shadow" ;
                current_pixel_state:valid_min = 0 ;
                current_pixel_state:valid_max = 5 ;
                current_pixel_state:_FillValue = -1b ;
                current_pixel_state:long_name = "LC pixel type mask" ;
                current_pixel_state:_Storage = "chunked" ;
                current_pixel_state:_ChunkSizes = 2048, 2048 ;
                current_pixel_state:_DeflateLevel = 6 ;
                current_pixel_state:_NoFill = "true" ;
        short observation_count(lat, lon) ;
                observation_count:standard_name = "land_cover_lccs number_of_observations" ;
                observation_count:valid_min = 0 ;
                observation_count:valid_max = 32767 ;
                observation_count:_FillValue = -1s ;
                observation_count:long_name = "number of valid observations" ;
                observation_count:_Storage = "chunked" ;
                observation_count:_ChunkSizes = 2048, 2048 ;
                observation_count:_DeflateLevel = 6 ;
                observation_count:_Endianness = "little" ;
                observation_count:_NoFill = "true" ;
        byte algorithmic_confidence_level(lat, lon) ;
                algorithmic_confidence_level:standard_name = "land_cover_lccs algorithmic_confidence" ;
                algorithmic_confidence_level:valid_min = 0 ;
                algorithmic_confidence_level:valid_max = 100 ;
                algorithmic_confidence_level:scale_factor = 0.01f ;
                algorithmic_confidence_level:_FillValue = -1b ;
                algorithmic_confidence_level:long_name = "LC map confidence level based on algorithm performance" ;
                algorithmic_confidence_level:_Storage = "chunked" ;
                algorithmic_confidence_level:_ChunkSizes = 2048, 2048 ;
                algorithmic_confidence_level:_DeflateLevel = 6 ;
                algorithmic_confidence_level:_NoFill = "true" ;
        float lat(lat) ;
                lat:long_name = "latitude" ;
                lat:standard_name = "latitude" ;
                lat:valid_min = -89.9986f ;
                lat:valid_max = 89.99861f ;
                lat:units = "degrees_north" ;
                lat:_Storage = "chunked" ;
                lat:_ChunkSizes = 64800 ;
                lat:_DeflateLevel = 6 ;
                lat:_Endianness = "little" ;
                lat:_NoFill = "true" ;
        float lon(lon) ;
                lon:long_name = "longitude" ;
                lon:standard_name = "longitude" ;
                lon:valid_min = -179.9986f ;
                lon:valid_max = 179.9986f ;
                lon:units = "degrees_east" ;
                lon:_Storage = "chunked" ;
                lon:_ChunkSizes = 129600 ;
                lon:_DeflateLevel = 6 ;
                lon:_Endianness = "little" ;
                lon:_NoFill = "true" ;
        int crs ;
                crs:i2m = "0.002777777701187,0.0,0.0,-0.002777777701187,-180.00000033927267,90.0" ;
                crs:wkt = "GEOGCS[\"WGS 84\", \r\n  DATUM[\"World Geodetic System 1984\", \r\n    SPHEROID[\"WGS 84\", 6378137.0, 298.257223563, AUTHORITY[\"EPSG\",\"7030\"]], \r\n    AUTHORITY[\"EPSG\",\"6326\"]], \r\n  PRIMEM[\"Greenwich\",
0.0, AUTHORITY[\"EPSG\",\"8901\"]], \r\n  UNIT[\"degree\", 0.017453292519943295], \r\n  AXIS[\"Geodetic longitude\", EAST], \r\n  AXIS[\"Geodetic latitude\", NORTH], \r\n  AUTHORITY[\"EPSG\",\"4326\"]]" ;
                crs:_Endianness = "little" ;
                crs:_NoFill = "true" ;

// global attributes:
                :title = "ESA CCI Land Cover Map" ;
                :summary = "This dataset contains the global ESA CCI land cover classification map derived from satellite data of one epoch." ;
                :type = "ESACCI-LC-L4-LCCS-Map-300m-P5Y" ;
                :id = "ESACCI-LC-L4-LCCS-Map-300m-P5Y-2005-v1.6.1" ;
                :project = "Climate Change Initiative - European Space Agency" ;
                :references = "http://www.esa-landcover-cci.org/" ;
                :institution = "Universite catholique de Louvain" ;
                :contact = "landcover-cci@uclouvain.be" ;
                :comment = "" ;
                :Conventions = "CF-1.6" ;
                :standard_name_vocabulary = "NetCDF Climate and Forecast (CF) Standard Names version 21" ;
                :keywords = "land cover classification,satellite,observation" ;
                :keywords_vocabulary = "NASA Global Change Master Directory (GCMD) Science Keywords" ;
                :license = "ESA CCI Data Policy: free and open access" ;
                :naming_authority = "org.esa-cci" ;
                :cdm_data_type = "grid" ;
                :TileSize = "2048:2048" ;
                :tracking_id = "00f7e0ee-3b0e-4ea3-9b9f-186e02fb4439" ;
                :product_version = "1.6.1" ;
                :date_created = "20151217T094622Z" ;
                :creator_name = "University catholique de Louvain" ;
                :creator_url = "http://www.uclouvain.be/" ;
                :creator_email = "landcover-cci@uclouvain.be" ;
                :source = "MERIS FR L1B version 5.05, MERIS RR L1B version 8.0, SPOT VGT P" ;
                :history = "amorgos-4,0, lc-sdr-1.0, lc-sr-1.0, lc-classification-1.0,lc-user-tools-3.10" ;
                :time_coverage_start = "20030101" ;
                :time_coverage_end = "20071231" ;
                :time_coverage_duration = "P5Y" ;
                :time_coverage_resolution = "P5Y" ;
                :geospatial_lat_min = "-89.99999" ;
                :geospatial_lat_max = "90.0" ;
                :geospatial_lon_min = "-180.0" ;
                :geospatial_lon_max = "179.99998" ;
                :spatial_resolution = "300m" ;
                :geospatial_lat_units = "degrees_north" ;
                :geospatial_lat_resolution = "0.002778" ;
                :geospatial_lon_units = "degrees_east" ;
                :geospatial_lon_resolution = "0.002778" ;
                :_SuperblockVersion = 2 ;
                :_IsNetcdf4 = 1 ;
                :_Format = "netCDF-4" ;
}
{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Variable of dtype int8 casted to float64 258500654
330243618 https://github.com/pydata/xarray/issues/1576#issuecomment-330243618 https://api.github.com/repos/pydata/xarray/issues/1576 MDEyOklzc3VlQ29tbWVudDMzMDI0MzYxOA== fmaussion 10050469 2017-09-18T14:37:06Z 2017-09-18T14:37:20Z MEMBER

Can you run ncdump -h -s on the file an report back?

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Variable of dtype int8 casted to float64 258500654

Advanced export

JSON shape: default, array, newline-delimited, object

CSV options:

CREATE TABLE [issue_comments] (
   [html_url] TEXT,
   [issue_url] TEXT,
   [id] INTEGER PRIMARY KEY,
   [node_id] TEXT,
   [user] INTEGER REFERENCES [users]([id]),
   [created_at] TEXT,
   [updated_at] TEXT,
   [author_association] TEXT,
   [body] TEXT,
   [reactions] TEXT,
   [performed_via_github_app] TEXT,
   [issue] INTEGER REFERENCES [issues]([id])
);
CREATE INDEX [idx_issue_comments_issue]
    ON [issue_comments] ([issue]);
CREATE INDEX [idx_issue_comments_user]
    ON [issue_comments] ([user]);
Powered by Datasette · Queries took 14.905ms · About: xarray-datasette
  • Sort ascending
  • Sort descending
  • Facet by this
  • Hide this column
  • Show all columns
  • Show not-blank rows