github: issue_comments: 9 rows where issue = 261403591 sorted by updated

9 rows where issue = 261403591 sorted by updated_at descending

Search:

descending

id	html_url	issue_url	node_id	user	created_at	updated_at ▲	author_association	body	reactions	issue
333224978	https://github.com/pydata/xarray/issues/1598#issuecomment-333224978	https://api.github.com/repos/pydata/xarray/issues/1598	MDEyOklzc3VlQ29tbWVudDMzMzIyNDk3OA==	shoyer 1217238	2017-09-29T20:01:43Z	2017-09-29T20:01:43Z	MEMBER	It sounds like we should control this in xarray to ensure consistent behavior.	{ "total_count": 3, "+1": 3, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	Need better user control of _FillValue attribute in NetCDF files 261403591
333175863	https://github.com/pydata/xarray/issues/1598#issuecomment-333175863	https://api.github.com/repos/pydata/xarray/issues/1598	MDEyOklzc3VlQ29tbWVudDMzMzE3NTg2Mw==	dnowacki-usgs 13837821	2017-09-29T16:37:07Z	2017-09-29T16:37:20Z	CONTRIBUTOR	@jhamman In brief, it's weird. Engine \| encoding['_FillValue'] = False \| Do nothing ----- \| ----- \| ----- netCDF4 \| Filling off \| Filling on scipy \| Filling off \| Filling off h5netcdf \| Filling on \| Filling off So, this is some peculiar behavior. Setting `_FillValue` to `False` works for netCDF4 (as we have seen), has no effect using the scipy engine, and seems to do the opposite of intended for h5netcdf. Code below: ``` import xarray as xr import numpy as np import pandas as pd ds = xr.Dataset({'foo': (('x', 'y'), np.random.rand(4, 5))}, coords={'x': [10, 20, 30, 40], 'y': pd.date_range('2000-01-01', periods=5), 'z': ('x', list('abcd'))}) ds.to_netcdf('notset_scipy.nc', engine='scipy') ds.to_netcdf('notset_netcdf4.nc', engine='netcdf4') ds.to_netcdf('notset_h5netcdf.nc', engine='h5netcdf') ds.y.encoding['_FillValue'] = False ds.to_netcdf('False_scipy.nc', engine='scipy') ds.to_netcdf('False_netcdf4.nc', engine='netcdf4') ds.to_netcdf('False_h5netcdf.nc', engine='h5netcdf') ``` netCDF4 `$ ncinfo -v y notset_netcdf4.nc <type 'netCDF4._netCDF4.Variable'> int64 y(y) units: days since 2000-01-01 00:00:00 calendar: proleptic_gregorian unlimited dimensions: current shape = (5,) filling on, default _FillValue of -9223372036854775806 used` `$ ncinfo -v y False_netcdf4.nc <type 'netCDF4._netCDF4.Variable'> int64 y(y) units: days since 2000-01-01 00:00:00 calendar: proleptic_gregorian unlimited dimensions: current shape = (5,) filling off` scipy `$ ncinfo -v y notset_scipy.nc <type 'netCDF4._netCDF4.Variable'> int32 y(y) units: days since 2000-01-01 00:00:00 calendar: proleptic_gregorian unlimited dimensions: current shape = (5,) filling off` `$ ncinfo -v y False_scipy.nc <type 'netCDF4._netCDF4.Variable'> int32 y(y) units: days since 2000-01-01 00:00:00 calendar: proleptic_gregorian _FillValue: 0 unlimited dimensions: current shape = (5,) filling off` h5netcdf `$ ncinfo -v y notset_h5netcdf.nc <type 'netCDF4._netCDF4.Variable'> int64 y(y) units: days since 2000-01-01 00:00:00 calendar: proleptic_gregorian unlimited dimensions: current shape = (5,) filling off` `$ ncinfo -v y False_h5netcdf.nc <type 'netCDF4._netCDF4.Variable'> int64 y(y) _FillValue: 0 units: days since 2000-01-01 00:00:00 calendar: proleptic_gregorian unlimited dimensions: current shape = (5,) filling on`	{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	Need better user control of _FillValue attribute in NetCDF files 261403591
333171129	https://github.com/pydata/xarray/issues/1598#issuecomment-333171129	https://api.github.com/repos/pydata/xarray/issues/1598	MDEyOklzc3VlQ29tbWVudDMzMzE3MTEyOQ==	jhamman 2443309	2017-09-29T16:17:32Z	2017-09-29T16:17:32Z	MEMBER	@dnowacki-usgs - you've made a good point. At least for the netCDF4 backend, this seems to work out of the box with None/False. Can someone check that this works for the scipy/h5netcdf backends? https://github.com/Unidata/netcdf4-python/blob/366debfff8b0bc53999c9e1ce9f4818bf7cf079a/netCDF4/_netCDF4.pyx#L3455-L3457	{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	Need better user control of _FillValue attribute in NetCDF files 261403591
333165596	https://github.com/pydata/xarray/issues/1598#issuecomment-333165596	https://api.github.com/repos/pydata/xarray/issues/1598	MDEyOklzc3VlQ29tbWVudDMzMzE2NTU5Ng==	dnowacki-usgs 13837821	2017-09-29T15:55:22Z	2017-09-29T15:55:22Z	CONTRIBUTOR	Allowing {'_FillValue': False} to indicate that _FillValue should not be included would be a simple, easy fix, so we should probably do that regardless. Correct me if you're talking about something different, but xarray already supports setting `_FillValue` to `False` to turn off filling. (Is there any use case where filling remains on but without a valid `_FillValue`?) For example, I have a netCDF processing routine using xarray. In the code I have the line for the `lon` dimension: `ds.lon.encoding['_FillValue'] = False` Which, for the relevant dimension, yields in ncinfo: `$ ncinfo -v lon yesfalse.nc <type 'netCDF4._netCDF4.Variable'> float64 lon(lon) units: degrees_east long_name: Longitude epic_code: 502 unlimited dimensions: current shape = (1,) filling off` If I comment out that line in my processing routine, I get the following: `$ ncinfo -v lon nofalse.nc <type 'netCDF4._netCDF4.Variable'> float64 lon(lon) _FillValue: nan units: degrees_east long_name: Longitude epic_code: 502 unlimited dimensions: current shape = (1,) filling on` I agree that changing from `False` to `None` does make better semantic sense.	{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	Need better user control of _FillValue attribute in NetCDF files 261403591
332950475	https://github.com/pydata/xarray/issues/1598#issuecomment-332950475	https://api.github.com/repos/pydata/xarray/issues/1598	MDEyOklzc3VlQ29tbWVudDMzMjk1MDQ3NQ==	shoyer 1217238	2017-09-28T20:12:05Z	2017-09-28T20:12:05Z	MEMBER	Agreed, None is probably better. There is no such thing as a "null" dtype. On Thu, Sep 28, 2017 at 1:10 PM Joe Hamman notifications@github.com wrote: I actually think we should use None as the _FillValue sentinel value. We do (sort of) support boolean arrays (#849 https://github.com/pydata/xarray/pull/849). — You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/pydata/xarray/issues/1598#issuecomment-332950001, or mute the thread https://github.com/notifications/unsubscribe-auth/ABKS1pR8YDZ9-Sw_cm4ckTI4XAV45UlOks5sm_0mgaJpZM4Pnox9 .	{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	Need better user control of _FillValue attribute in NetCDF files 261403591
332950001	https://github.com/pydata/xarray/issues/1598#issuecomment-332950001	https://api.github.com/repos/pydata/xarray/issues/1598	MDEyOklzc3VlQ29tbWVudDMzMjk1MDAwMQ==	jhamman 2443309	2017-09-28T20:10:13Z	2017-09-28T20:10:13Z	MEMBER	I actually think we should use `None` as the `_FillValue` sentinel value. We do (sort of) support boolean arrays (https://github.com/pydata/xarray/pull/849).	{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	Need better user control of _FillValue attribute in NetCDF files 261403591
332949221	https://github.com/pydata/xarray/issues/1598#issuecomment-332949221	https://api.github.com/repos/pydata/xarray/issues/1598	MDEyOklzc3VlQ29tbWVudDMzMjk0OTIyMQ==	shoyer 1217238	2017-09-28T20:07:15Z	2017-09-28T20:07:15Z	MEMBER	There is also the philosophical problem of fill values for coordinate variables. Indeed, this is prohibited by CF conventions -- but xarray (like pandas) takes a more flexible approach here, allowing for missing values for all variables. You can already specify an explicit choice for `_FillValue`, e.g., `ds.to_netcdf(..., encoding={'my_variable': {'_FillValue': 1e35}})`. Allowing `{'_FillValue': False}` to indicate that `_FillValue` should not be included would be a simple, easy fix, so we should probably do that regardless. (There is no need worry about `False` conflicting with a legitimate fill value since netCDF does not have a boolean dtype.)	{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	Need better user control of _FillValue attribute in NetCDF files 261403591
332942206	https://github.com/pydata/xarray/issues/1598#issuecomment-332942206	https://api.github.com/repos/pydata/xarray/issues/1598	MDEyOklzc3VlQ29tbWVudDMzMjk0MjIwNg==	mmartini-usgs 23199378	2017-09-28T19:38:42Z	2017-09-28T19:38:42Z	NONE	There is also the philosophical problem of fill values for coordinate variables. To be true to reality, one really would want to add an interpolated value that fills whatever gap or bad value exists. That seems to be out of the scope of xarray though. I'm fine with a flag that controls only the coordinate data. That said, for the rest of the variables, we avoid NaN in _FillValue. We use 1E35. So there you could give the user a choice in default fill value. It seems pythonic to give the user flexibility. And the minute you satisfy us, there will be another use case that comes along with conflicting requirements. So you could use a flag and make it the user's choice, and not xarray's concern. It also depends on where in the process one cleans up one's data - reduce first, then QA/QC, or QA/QC first, then reduce. We do both, it depends on the instrument.	{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	Need better user control of _FillValue attribute in NetCDF files 261403591
332934061	https://github.com/pydata/xarray/issues/1598#issuecomment-332934061	https://api.github.com/repos/pydata/xarray/issues/1598	MDEyOklzc3VlQ29tbWVudDMzMjkzNDA2MQ==	shoyer 1217238	2017-09-28T19:05:46Z	2017-09-28T19:05:46Z	MEMBER	cc @thenaomig @laliberte There are at least two ways to fix this: 1. Support a flag of some sort in encoding (e.g., `_FillValue = False`) to indicate that fill value shouldn't be added. This would be easy to add, but is somewhat inelegant. 2. Check for the presence of NaNs before setting `_FillValue = NaN`. This would be easy to add for dimension coordinates because they are already guaranteed to be in memory, but could cause performance trouble if any inputs are loaded as dask arrays. I don't know a satisfactory way to handle dask arrays with our current design, since we don't want to add another pass over the data to check for NaNs. I suppose one option would be to refactor our backend classes to write data before writing attributes and then make some sort of dask array operation that checks for NaNs as the data is written. But I'm not even sure this would work with the standard dask task schedulers.	{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	Need better user control of _FillValue attribute in NetCDF files 261403591

Advanced export

JSON shape: default, array, newline-delimited, object

CREATE TABLE [issue_comments] (
   [html_url] TEXT,
   [issue_url] TEXT,
   [id] INTEGER PRIMARY KEY,
   [node_id] TEXT,
   [user] INTEGER REFERENCES [users]([id]),
   [created_at] TEXT,
   [updated_at] TEXT,
   [author_association] TEXT,
   [body] TEXT,
   [reactions] TEXT,
   [performed_via_github_app] TEXT,
   [issue] INTEGER REFERENCES [issues]([id])
);
CREATE INDEX [idx_issue_comments_issue]
    ON [issue_comments] ([issue]);
CREATE INDEX [idx_issue_comments_user]
    ON [issue_comments] ([user]);