home / github / issues

Menu
  • Search all tables
  • GraphQL API

issues: 1451961530

This data as json

id node_id number title user state locked assignee milestone comments created_at updated_at closed_at author_association active_lock_reason draft pull_request body reactions performed_via_github_app state_reason repo type
1451961530 I_kwDOAMm_X85Wiyy6 7292 `dtype` of `zarr` array unexpectedly changes when `fill_value` is specified 8552 open 0     2 2022-11-16T17:03:19Z 2022-12-01T12:46:16Z   NONE      

What happened?

Opening a zarr group which contains an array of integer dtype with a fill_value results in an xarray dataset in which the array has floating-point dtype.

What did you expect to happen?

An xarray dataset in which the array has the original integer dtype.

Minimal Complete Verifiable Example

```Python import zarr import xarray

Create zarr with integer dtype and fill_value

grp = zarr.open_group("test.zarr") arr = grp.create(shape=(10,), name="array", dtype="int8", fill_value=-1) arr.attrs['_ARRAY_DIMENSIONS'] = ['dim1']

Open in xarray to see that the dtype is now float32

ds = xarray.open_zarr("test.zarr", consolidated=False) ds['array'].dtype ```

MVCE confirmation

  • [X] Minimal example — the example is as focused as reasonably possible to demonstrate the underlying issue in xarray.
  • [X] Complete example — the example is self-contained, including all data and the text of any traceback.
  • [X] Verifiable example — the example copy & pastes into an IPython prompt or Binder notebook, returning the result.
  • [X] New issue — a search of GitHub Issues suggests this is not a duplicate.

Relevant log output

No response

Anything else we need to know?

This is a result of https://github.com/pydata/xarray/issues/5475 where xarray's _FillValue has a different meaning to zarr's fill_value.

The change of dtype happens at https://github.com/pydata/xarray/blob/3c98ec7d96cc4b46664850cc7a40af2bc184fea0/xarray/coding/variables.py#L204 where xarray is trying to find a dtype where fill_value can represent "missing" data, wheras in zarr, fill_value can be any data value as its intent is to fill in missing chunks not represent missing data.

I'm not sure how best to fix this - maybe if the zarr fill value is clearly a non-missing value for the dtype then xarray should act as if it doesn't have a fill value? Happy to work on a PR if that seems to be a valid approach, although others may have thought on if that is a breaking change for some folks.

Environment

INSTALLED VERSIONS ------------------ commit: None python: 3.10.4 (main, Jun 29 2022, 12:14:53) [GCC 11.2.0] python-bits: 64 OS: Linux OS-release: 5.15.0-47-generic machine: x86_64 processor: x86_64 byteorder: little LC_ALL: None LANG: en_GB.UTF-8 LOCALE: ('en_GB', 'UTF-8') libhdf5: None libnetcdf: None xarray: 2022.11.0 pandas: 1.3.5 numpy: 1.21.6 scipy: 1.9.3 netCDF4: None pydap: None h5netcdf: None h5py: None Nio: None zarr: 2.13.3 cftime: None nc_time_axis: None PseudoNetCDF: None rasterio: None cfgrib: None iris: None bottleneck: None dask: 2022.01.0 distributed: 2022.01.0 matplotlib: None cartopy: None seaborn: None numbagg: None fsspec: 2022.10.0 cupy: None pint: None sparse: None flox: None numpy_groupies: None setuptools: 59.6.0 pip: 22.0.2 conda: None pytest: None IPython: None sphinx: None
{
    "url": "https://api.github.com/repos/pydata/xarray/issues/7292/reactions",
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
    13221727 issue

Links from other tables

  • 3 rows from issues_id in issues_labels
  • 2 rows from issue in issue_comments
Powered by Datasette · Queries took 0.565ms · About: xarray-datasette