home / github / issue_comments

Menu
  • GraphQL API
  • Search all tables

issue_comments: 1169175453

This data as json

html_url issue_url id node_id user created_at updated_at author_association body reactions performed_via_github_app issue
https://github.com/pydata/xarray/issues/6733#issuecomment-1169175453 https://api.github.com/repos/pydata/xarray/issues/6733 1169175453 IC_kwDOAMm_X85FsDOd 9569132 2022-06-28T20:02:14Z 2022-06-28T20:02:14Z NONE

Thanks again for your help!

I think that is what I am doing. If I understand right:

Using to_netcdf to handle the encoding

  • I'm passing a DataArray containing float32 data.
  • The np.ndarray containing the data can simply use np.nan to represent missing data.
  • Using to_netcdf with an encoding to uint16 has memory usage of 2 x float + 1 x int - set NaN on a copy and convert to int
  • The other thing that is puzzling here is that 35GB * 2.5 (two float32 copies + one uint16 copy) is ~ 90GB but many of the processes are using much more than that.

Manual encoding

  • I'm passing a DataArray containing uint16 data.
  • However - as far as I can see - DataArray itself doesn't specify an alternative missing data value. Because np.nan is a float, you can't represent missing data within an integer DataArray?
  • So, I am using _FillValue=65535 in the encoding to to_netcdf.
  • But that still appears to be triggering the encoding step - the Traceback in my second comment was from a manually encoded uint16 DataArray.
{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  1286995366
Powered by Datasette · Queries took 0.805ms · About: xarray-datasette