issue_comments: 1169175453

This data as json

html_url	issue_url	id	node_id	user	created_at	updated_at	author_association	body	reactions	performed_via_github_app	issue
https://github.com/pydata/xarray/issues/6733#issuecomment-1169175453	https://api.github.com/repos/pydata/xarray/issues/6733	1169175453	IC_kwDOAMm_X85FsDOd	9569132	2022-06-28T20:02:14Z	2022-06-28T20:02:14Z	NONE	Thanks again for your help! I think that is what I am doing. If I understand right: Using `to_netcdf` to handle the encoding I'm passing a DataArray containing `float32` data. The `np.ndarray` containing the data can simply use `np.nan` to represent missing data. Using `to_netcdf` with an encoding to `uint16` has memory usage of 2 x float + 1 x int - set NaN on a copy and convert to int The other thing that is puzzling here is that 35GB * 2.5 (two `float32` copies + one `uint16` copy) is ~ 90GB but many of the processes are using much more than that. Manual encoding I'm passing a DataArray containing `uint16` data. However - as far as I can see - DataArray itself doesn't specify an alternative missing data value. Because `np.nan` is a float, you can't represent missing data within an integer DataArray? So, I am using `_FillValue=65535` in the encoding to `to_netcdf`. But that still appears to be triggering the encoding step - the Traceback in my second comment was from a manually encoded `uint16` DataArray.	{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }		1286995366