issues: 789410367
This data as json
id | node_id | number | title | user | state | locked | assignee | milestone | comments | created_at | updated_at | closed_at | author_association | active_lock_reason | draft | pull_request | body | reactions | performed_via_github_app | state_reason | repo | type |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
789410367 | MDU6SXNzdWU3ODk0MTAzNjc= | 4826 | Reading and writing a zarr dataset multiple times casts bools to int8 | 463809 | closed | 0 | 10 | 2021-01-19T22:02:15Z | 2023-04-10T09:26:27Z | 2023-04-10T09:26:27Z | CONTRIBUTOR | What happened: Reading and writing zarr dataset multiple times into different paths changes What you expected to happen: My array's dtype in numpy/dask should not change, even if certain storage backends store dtypes a certain way. Minimal Complete Verifiable Example: ```python import xarray as xr import numpy as np ds = xr.Dataset({ "bool_field": xr.DataArray( np.random.randn(5) < 0.5, dims=('g'), coords={'g': np.arange(5)} ) }) ds.to_zarr('test.zarr', mode="w") d2 = xr.open_zarr('test.zarr') print(d2.bool_field.dtype) print(d2.bool_field.encoding) d2.to_zarr("test2.zarr", mode="w") d3 = xr.open_zarr('test2.zarr')
print(d3.bool_field.dtype)
Currently workaround is to explicitly set encodings. This fixes the problem:
Environment: Output of <tt>xr.show_versions()</tt>``` # I'll update with the the full output of xr.show_versions() soon. In [4]: xr.__version__ Out[4]: '0.16.2' In [2]: zarr.__version__ Out[2]: '2.6.1' ``` |
{ "url": "https://api.github.com/repos/pydata/xarray/issues/4826/reactions", "total_count": 1, "+1": 1, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
completed | 13221727 | issue |