home / github / issues

Menu
  • Search all tables
  • GraphQL API

issues: 2098882374

This data as json

id node_id number title user state locked assignee milestone comments created_at updated_at closed_at author_association active_lock_reason draft pull_request body reactions performed_via_github_app state_reason repo type
2098882374 I_kwDOAMm_X859GmdG 8660 dtype encoding ignored during IO? 35968931 closed 0     3 2024-01-24T18:50:47Z 2024-02-05T17:35:03Z 2024-02-05T17:35:02Z MEMBER      

What happened?

When I set the .encoding['dtype'] attribute before saving a to disk, the actual on-disk representation appears to store a record of the dtype encoding, but when opening it back up in xarray I get the same dtype I had before, not the one specified in the encoding. Is that what's supposed to happen? How does this work? (This happens with both zarr and netCDF.)

What did you expect to happen?

I expected that setting .encoding['dtype'] would mean that once I open the data back up, it would be in the new dtype that I set in the encoding.

Minimal Complete Verifiable Example

```Python air = xr.tutorial.open_dataset('air_temperature')

air['air'].dtype # returns dtype('float32')

air['air'].encoding['dtype'] # returns dtype('int16'), which already seems weird

air.to_zarr('air.zarr') # I would assume here that the encoding actually does something during IO

now if I check the zarr .zarray metadata for the air variable it says

"dtype":"<i2"`

air2 = xr.open_dataset('air.zarr', engine='zarr') # open it back up

air2['air'].dtype # returns dtype('float32'), but I expected dtype('int16')

(the same thing happens also with saving to netCDF instead of Zarr) ```

MVCE confirmation

  • [X] Minimal example — the example is as focused as reasonably possible to demonstrate the underlying issue in xarray.
  • [X] Complete example — the example is self-contained, including all data and the text of any traceback.
  • [X] Verifiable example — the example copy & pastes into an IPython prompt or Binder notebook, returning the result.
  • [X] New issue — a search of GitHub Issues suggests this is not a duplicate.
  • [X] Recent environment — the issue occurs with the latest version of xarray and its dependencies.

Relevant log output

No response

Anything else we need to know?

I know I didn't explicitly cast with .asdtype, but I'm still confused as to what the relation between the dtype encoding is supposed to be here.

I am probably just misunderstanding how this is supposed to work, but then this is arguably a docs issue, because here it says "[the encoding dtype field] controls the type of the data written on disk", which I would have thought also affects the data you get back when you open it up again?

Environment

main branch of xarray

{
    "url": "https://api.github.com/repos/pydata/xarray/issues/8660/reactions",
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  completed 13221727 issue

Links from other tables

  • 4 rows from issues_id in issues_labels
  • 0 rows from issue in issue_comments
Powered by Datasette · Queries took 0.727ms · About: xarray-datasette