home / github

Menu
  • GraphQL API
  • Search all tables

issue_comments

Table actions
  • GraphQL API for issue_comments

5 rows where issue = 407746874 and user = 1217238 sorted by updated_at descending

✎ View and edit SQL

This data as json, CSV (advanced)

Suggested facets: created_at (date), updated_at (date)

user 1

  • shoyer · 5 ✖

issue 1

  • Ability to force float64 instead of float32 issue #2304 · 5 ✖

author_association 1

  • MEMBER 5
id html_url issue_url node_id user created_at updated_at ▲ author_association body reactions performed_via_github_app issue
480687178 https://github.com/pydata/xarray/pull/2751#issuecomment-480687178 https://api.github.com/repos/pydata/xarray/issues/2751 MDEyOklzc3VlQ29tbWVudDQ4MDY4NzE3OA== shoyer 1217238 2019-04-08T05:19:47Z 2019-04-08T06:00:03Z MEMBER

Correct me if i'm wrong but this is a bit hard to do without knowing max,min values to avoid overlapping Evaluating those could break performance if only used for encoding and decoding.

Right, we definitely don't want to inspect the data to verify if it fits.

~~I think we will will need explicitly checks for isinstance(x, (int, float)) (for scale_factor and add_offset) to identify these cases in the encode() method. Let me try to come up with a concrete suggestion...~~ EDIT: See my suggestion below. I don't think we want these explicit checks, after all.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Ability to force float64 instead of float32 issue #2304 407746874
473496064 https://github.com/pydata/xarray/pull/2751#issuecomment-473496064 https://api.github.com/repos/pydata/xarray/issues/2751 MDEyOklzc3VlQ29tbWVudDQ3MzQ5NjA2NA== shoyer 1217238 2019-03-16T03:50:00Z 2019-03-16T03:50:00Z MEMBER

@shoyer do you mean that we consider that by default when we deal with Python's int and float we cast them to np.int64 and np.float64 ?

Currently we cast to int32/float64 for netCDF3 files (which don't support int64) and int64/float64 for netCDF4 files.

That default behavior doesn't really make sense if the user provided builtin Python numbers for these attributes.

For example, if I write something like the following, the data for x should probably decode to float32: python ds = xarray.Dataset({'x': np.float32(0)}) encoding = {'x': {'scale_factor': 1, 'add_offset': 1, 'dtype': 'int8'}} ds.to_netcdf('test.nc', encoding=encoding) With the current implementation in this PR, I think the data will come back as float64.

To get something that will decode as float64 without the original data being float64, the user should need to intentionally choose dtypes for scale_factor and add_offsert, e.g., python ds = xarray.Dataset({'x': np.float32(0)}) encoding = {'x': {'scale_factor': np.float64(1), 'add_offset': np.float64(1), 'dtype': 'int8'}} ds.to_netcdf('test.nc', encoding=encoding)

Does this make sense to you? From my perspective, two goals for this change should be: 1. Adhere to CF conventions for deciding how to decode data from disk. 2. Provide a consistent interface for xarray users, so that by default data is restored in the same form that it was written.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Ability to force float64 instead of float32 issue #2304 407746874
473165754 https://github.com/pydata/xarray/pull/2751#issuecomment-473165754 https://api.github.com/repos/pydata/xarray/issues/2751 MDEyOklzc3VlQ29tbWVudDQ3MzE2NTc1NA== shoyer 1217238 2019-03-15T05:42:18Z 2019-03-15T05:42:18Z MEMBER

@daoudjahdou thanks for sticking with this.

I'm concerned about how lose the "roundtrip" invariant in some edge cases, where the user did not specifically indicate the desired dtype but rather used a Python type. Right now xarray seems to use inconsistent rules for casting attribute types, based on the netCDF backend: ``` In [8]: for engine in ['scipy', 'netcdf4', 'h5netcdf']: ...: ds = xarray.Dataset({'x': np.float32(0)}) ...: encoding = {'x': {'scale_factor': 1, 'add_offset': 1}} ...: ds.to_netcdf(f'test-{engine}.nc', engine=engine, encoding=encoding) ...:

In [9]: ! ncdump -h test-scipy.nc netcdf test-scipy { variables: float x ; x:add_offset = 1 ; x:scale_factor = 1 ; x:_FillValue = NaNf ; }

In [10]: ! ncdump -h test-netcdf4.nc netcdf test-netcdf4 { variables: float x ; x:_FillValue = NaNf ; x:add_offset = 1LL ; x:scale_factor = 1LL ; }

In [11]: ! ncdump -h test-h5netcdf.nc netcdf test-h5netcdf { variables: float x ; x:_FillValue = NaNf ; x:add_offset = 1LL ; x:scale_factor = 1LL ; } ```

At least for add_offset and scale_factor, I think we should be more careful when determining the attribute dtype. Python's int and float don't have any intrinsic dtype. If the attributes are specified as native Python types (not NumPy types) and can be safely cast into the dtype of the data, we should probably use that instead.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Ability to force float64 instead of float32 issue #2304 407746874
462876031 https://github.com/pydata/xarray/pull/2751#issuecomment-462876031 https://api.github.com/repos/pydata/xarray/issues/2751 MDEyOklzc3VlQ29tbWVudDQ2Mjg3NjAzMQ== shoyer 1217238 2019-02-12T18:24:51Z 2019-02-12T18:24:51Z MEMBER

@daoudjahdou what do you think of @magau's proposed solution over in https://github.com/pydata/xarray/issues/2304#issuecomment-462592638? Checking the dtypes of of add_offset and scale_factor might be a better alternative than requiring setting a custom option.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Ability to force float64 instead of float32 issue #2304 407746874
461856173 https://github.com/pydata/xarray/pull/2751#issuecomment-461856173 https://api.github.com/repos/pydata/xarray/issues/2751 MDEyOklzc3VlQ29tbWVudDQ2MTg1NjE3Mw== shoyer 1217238 2019-02-08T16:15:11Z 2019-02-08T16:15:11Z MEMBER

@daoudjahdou sorry I missed your comment over in #2304. Threading the new parameter down though other functions is definitely the preferred approach here -- mutating global variables makes it much harder to follow control flow. Though I would probably stop short of modifying maybe_promote and would put this logic inside CFMaskCoder and/or CFScaleOffsetCoder in xarray.coding.variables.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Ability to force float64 instead of float32 issue #2304 407746874

Advanced export

JSON shape: default, array, newline-delimited, object

CSV options:

CREATE TABLE [issue_comments] (
   [html_url] TEXT,
   [issue_url] TEXT,
   [id] INTEGER PRIMARY KEY,
   [node_id] TEXT,
   [user] INTEGER REFERENCES [users]([id]),
   [created_at] TEXT,
   [updated_at] TEXT,
   [author_association] TEXT,
   [body] TEXT,
   [reactions] TEXT,
   [performed_via_github_app] TEXT,
   [issue] INTEGER REFERENCES [issues]([id])
);
CREATE INDEX [idx_issue_comments_issue]
    ON [issue_comments] ([issue]);
CREATE INDEX [idx_issue_comments_user]
    ON [issue_comments] ([user]);
Powered by Datasette · Queries took 273.576ms · About: xarray-datasette