home / github

Menu
  • Search all tables
  • GraphQL API

issue_comments

Table actions
  • GraphQL API for issue_comments

9 rows where issue = 199188476 sorted by updated_at descending

✎ View and edit SQL

This data as json, CSV (advanced)

Suggested facets: reactions, created_at (date), updated_at (date)

user 6

  • gerritholl 4
  • Hoeze 1
  • shoyer 1
  • max-sixty 1
  • eric-czech 1
  • stale[bot] 1

author_association 3

  • CONTRIBUTOR 4
  • NONE 3
  • MEMBER 2

issue 1

  • Use masked arrays while preserving int · 9 ✖
id html_url issue_url node_id user created_at updated_at ▲ author_association body reactions performed_via_github_app issue
605697466 https://github.com/pydata/xarray/issues/1194#issuecomment-605697466 https://api.github.com/repos/pydata/xarray/issues/1194 MDEyOklzc3VlQ29tbWVudDYwNTY5NzQ2Ng== eric-czech 6130352 2020-03-29T20:37:29Z 2020-03-29T20:37:29Z NONE

I agree, I have this same issue with large genotyping data arrays often containing tiny integers and some degree of missingness in nearly 100% of raw datasets. Are there recommended workarounds now? I am thinking of constantly using Datasets instead of DataArrays with mask arrays to accompany every data array, but I'm not sure if that's the best interim solution.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Use masked arrays while preserving int 199188476
605632224 https://github.com/pydata/xarray/issues/1194#issuecomment-605632224 https://api.github.com/repos/pydata/xarray/issues/1194 MDEyOklzc3VlQ29tbWVudDYwNTYzMjIyNA== Hoeze 1200058 2020-03-29T13:00:29Z 2020-03-29T13:03:46Z NONE

Currently I keep carrying a "<arrayname>_missing" mask with all of my unstacked arrays to solve this issue. It would be very desirable to have a clean solution for this to keep arrays from being converted to float. Also, NaN does not necessarily mean NA which already caused me quite some head-scratching in the past. Further, it would be a very cool indicator to see which values of a dense array should be converted into a sparse array.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Use masked arrays while preserving int 199188476
580761178 https://github.com/pydata/xarray/issues/1194#issuecomment-580761178 https://api.github.com/repos/pydata/xarray/issues/1194 MDEyOklzc3VlQ29tbWVudDU4MDc2MTE3OA== gerritholl 500246 2020-01-31T14:42:36Z 2020-01-31T14:42:36Z CONTRIBUTOR

Pandas 1.0 uses pd.NA for integers, boolean, and string dtypes: https://pandas.pydata.org/pandas-docs/stable/whatsnew/v1.0.0.html#experimental-na-scalar-to-denote-missing-values

{
    "total_count": 2,
    "+1": 2,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Use masked arrays while preserving int 199188476
457220076 https://github.com/pydata/xarray/issues/1194#issuecomment-457220076 https://api.github.com/repos/pydata/xarray/issues/1194 MDEyOklzc3VlQ29tbWVudDQ1NzIyMDA3Ng== gerritholl 500246 2019-01-24T14:40:33Z 2019-01-24T14:40:33Z CONTRIBUTOR

@max-sixty Interesting! I wonder what it would take to make use of this "nullable integer data type" in xarray. It wouldn't work to convert it to a standard numpy array (da.values) retaining the dtype, but one could make a new .to_maskedarray() method returning a numpy masked array; that would probably be easier than to add full support for masked arrays.

{
    "total_count": 4,
    "+1": 4,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Use masked arrays while preserving int 199188476
457209272 https://github.com/pydata/xarray/issues/1194#issuecomment-457209272 https://api.github.com/repos/pydata/xarray/issues/1194 MDEyOklzc3VlQ29tbWVudDQ1NzIwOTI3Mg== max-sixty 5635139 2019-01-24T14:09:32Z 2019-01-24T14:09:32Z MEMBER

@gerritholl check out https://pandas-docs.github.io/pandas-docs-travis/whatsnew/v0.24.0.html#whatsnew-0240-enhancements-intna

I think that's the closest way of having int support; from my understanding supporting masked arrays directly would be a decent lift

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Use masked arrays while preserving int 199188476
457159560 https://github.com/pydata/xarray/issues/1194#issuecomment-457159560 https://api.github.com/repos/pydata/xarray/issues/1194 MDEyOklzc3VlQ29tbWVudDQ1NzE1OTU2MA== gerritholl 500246 2019-01-24T11:10:46Z 2019-01-24T11:10:46Z CONTRIBUTOR

I think this issue should remain open. I think it would still be highly desirable to implement support for true masked arrays, such that any value can be masked without throwing away the original value.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Use masked arrays while preserving int 199188476
457158136 https://github.com/pydata/xarray/issues/1194#issuecomment-457158136 https://api.github.com/repos/pydata/xarray/issues/1194 MDEyOklzc3VlQ29tbWVudDQ1NzE1ODEzNg== stale[bot] 26384082 2019-01-24T11:05:22Z 2019-01-24T11:05:22Z NONE

In order to maintain a list of currently relevant issues, we mark issues as stale after a period of inactivity If this issue remains relevant, please comment here; otherwise it will be marked as closed automatically

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Use masked arrays while preserving int 199188476
271077863 https://github.com/pydata/xarray/issues/1194#issuecomment-271077863 https://api.github.com/repos/pydata/xarray/issues/1194 MDEyOklzc3VlQ29tbWVudDI3MTA3Nzg2Mw== gerritholl 500246 2017-01-07T11:24:49Z 2017-01-07T11:32:06Z CONTRIBUTOR

I don't see how an integer dtype could ever support missing values; float missing values are specifically defined by IEEE 754 but for ints, every sequence of bits corresponds to a valid value. OTOH, NetCDF does have a _FillValue attribute that works for any type including int. If we view xarray as "NetCDF in memory" that could be an approach to follow, but for numpy in general it would fairly heavily break existing code (see also http://www.numpy.org/NA-overview.html) in particular for 8-bit types. If i understand correctly, R uses INT_MAX which would be 127 for 'int8… Apparently, R ints are always 32 bits. I'm new to xarray so I don't have a good idea on how much work adding support for masked arrays would be, but I'll take your word that it's not straightforward.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Use masked arrays while preserving int 199188476
271058005 https://github.com/pydata/xarray/issues/1194#issuecomment-271058005 https://api.github.com/repos/pydata/xarray/issues/1194 MDEyOklzc3VlQ29tbWVudDI3MTA1ODAwNQ== shoyer 1217238 2017-01-07T02:54:54Z 2017-01-07T02:54:54Z MEMBER

I answered your question on StackOverflow.

I agree that this is unfortunate. The cleanest solution would be an integer dtype with missing value support in NumPy itself, but that isn't going to happen anytime soon.

I'm not entirely opposed to the idea of adding (limited) support for masked arrays in xarray (see also https://github.com/pydata/xarray/pull/1118), but this could be a lot of work for relatively limited return.

I definitely recommend trying dask for processing multi-gigabyte arrays. You might even find the performance boost compelling enough that you could forgive the limitation that it doesn't handle masked arrays, either.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Use masked arrays while preserving int 199188476

Advanced export

JSON shape: default, array, newline-delimited, object

CSV options:

CREATE TABLE [issue_comments] (
   [html_url] TEXT,
   [issue_url] TEXT,
   [id] INTEGER PRIMARY KEY,
   [node_id] TEXT,
   [user] INTEGER REFERENCES [users]([id]),
   [created_at] TEXT,
   [updated_at] TEXT,
   [author_association] TEXT,
   [body] TEXT,
   [reactions] TEXT,
   [performed_via_github_app] TEXT,
   [issue] INTEGER REFERENCES [issues]([id])
);
CREATE INDEX [idx_issue_comments_issue]
    ON [issue_comments] ([issue]);
CREATE INDEX [idx_issue_comments_user]
    ON [issue_comments] ([user]);
Powered by Datasette · Queries took 21.259ms · About: xarray-datasette