home / github

Menu
  • Search all tables
  • GraphQL API

issue_comments

Table actions
  • GraphQL API for issue_comments

4 rows where author_association = "MEMBER" and issue = 1286995366 sorted by updated_at descending

✎ View and edit SQL

This data as json, CSV (advanced)

Suggested facets: created_at (date), updated_at (date)

user 1

  • dcherian 4

issue 1

  • CFMaskCoder creates unnecessary copy for `uint16` variables · 4 ✖

author_association 1

  • MEMBER · 4 ✖
id html_url issue_url node_id user created_at updated_at ▲ author_association body reactions performed_via_github_app issue
1170142019 https://github.com/pydata/xarray/issues/6733#issuecomment-1170142019 https://api.github.com/repos/pydata/xarray/issues/6733 IC_kwDOAMm_X85FvvND dcherian 2448579 2022-06-29T15:41:45Z 2022-06-29T15:41:45Z MEMBER

Where a cast is specified in encoding, could xarray not cast the data first to get that isolated copy and then set the fill on the cast array?

If you cast float to int you might lose information neeeded to accurately do the filling step.

the only possible gotcha here for users is that data stored in a netcdf file as integer type data but with a _FillValue is loaded as a float using np.NaN because there is no np.nan equivalent for integer types.

Yes. but your data originated as floating point so this is correct.

ncdump asserts that there has to be a missing data value, which is 65535 unless set otherwise, but xarray is using the presence of the _FillValue attribute to signal the presence of missing data.

You can specify missing_value and/or _FillValue attributes. So you could try that. Xarray does not follow the default "fill values" because it can be confusing; it is valid to store 65535 as u2 for example.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  CFMaskCoder creates unnecessary copy for `uint16` variables 1286995366
1169321303 https://github.com/pydata/xarray/issues/6733#issuecomment-1169321303 https://api.github.com/repos/pydata/xarray/issues/6733 IC_kwDOAMm_X85Fsm1X dcherian 2448579 2022-06-28T21:57:05Z 2022-06-28T21:57:05Z MEMBER

So, I am using _FillValue=65535 in the encoding to to_netcdf.

encoding is really an instruction to Xarray to encode the data. But you've already done that. So specify _FillValue in attrs instead of encoding. This will get written to file and be interpreted on read. At least IIUC =)

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  CFMaskCoder creates unnecessary copy for `uint16` variables 1286995366
1169150139 https://github.com/pydata/xarray/issues/6733#issuecomment-1169150139 https://api.github.com/repos/pydata/xarray/issues/6733 IC_kwDOAMm_X85Fr9C7 dcherian 2448579 2022-06-28T19:35:36Z 2022-06-28T19:35:36Z MEMBER

Try setting _FillValue in attrs? I haven't triedf this...

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  CFMaskCoder creates unnecessary copy for `uint16` variables 1286995366
1169014257 https://github.com/pydata/xarray/issues/6733#issuecomment-1169014257 https://api.github.com/repos/pydata/xarray/issues/6733 IC_kwDOAMm_X85Frb3x dcherian 2448579 2022-06-28T17:23:12Z 2022-06-28T17:26:17Z MEMBER

Yeah I think the issue is that the "CFMaskCoder" tries to repalce NaNs regardless of the dtype of the variable. Doing this creates a copy in this step: where(notnull(data), data, other).

https://github.com/pydata/xarray/blob/787a96c15161c9025182291b672b3d3c5548a6c7/xarray/coding/variables.py#L149

You should set FillValue to None after manually encoding to ints to skip the extra copy.

We should probably raise an error or at least a warning for integer dtypes and not-None FillValue


As for your initial question, we create a copy of the float array when replacing NaNs (does not happen in-place), then convert to int. So you'll need to account for 2x float array + 1x int array memory use.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  CFMaskCoder creates unnecessary copy for `uint16` variables 1286995366

Advanced export

JSON shape: default, array, newline-delimited, object

CSV options:

CREATE TABLE [issue_comments] (
   [html_url] TEXT,
   [issue_url] TEXT,
   [id] INTEGER PRIMARY KEY,
   [node_id] TEXT,
   [user] INTEGER REFERENCES [users]([id]),
   [created_at] TEXT,
   [updated_at] TEXT,
   [author_association] TEXT,
   [body] TEXT,
   [reactions] TEXT,
   [performed_via_github_app] TEXT,
   [issue] INTEGER REFERENCES [issues]([id])
);
CREATE INDEX [idx_issue_comments_issue]
    ON [issue_comments] ([issue]);
CREATE INDEX [idx_issue_comments_user]
    ON [issue_comments] ([user]);
Powered by Datasette · Queries took 16.178ms · About: xarray-datasette