issues: 1639841581
This data as json
id | node_id | number | title | user | state | locked | assignee | milestone | comments | created_at | updated_at | closed_at | author_association | active_lock_reason | draft | pull_request | body | reactions | performed_via_github_app | state_reason | repo | type |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
1639841581 | I_kwDOAMm_X85hvf8t | 7672 | to_zarr writes unexpected NaNs with chunks=-1 | 8249360 | open | 0 | 5 | 2023-03-24T18:15:06Z | 2023-11-09T06:10:21Z | NONE | What happened?I'm running into some unexpected behavior with What did you expect to happen?My data would be written the same regardless of whether I explicitly loaded the dataset.
The documentation for
I encountered this situation when operating on datasets that had been loaded from disk ( Minimal Complete Verifiable Example```Python import pandas as pd import xarray as xr import numpy as np def create_dataset(time, site): temperature = 15 + 8 * np.random.randn(1, 3) precipitation = 10 * np.random.rand(1, 3) ds = xr.Dataset( data_vars=dict( temperature=(["site", "time"], temperature), precipitation=(["site", "time"], precipitation), ),
) return ds time_1 = pd.date_range("2014-09-06", periods=3) time_2 = pd.date_range("2014-09-09", periods=3) create and save the first dataset as a zarrds_a = create_dataset(time_1, ["site_1"]) fname_a = '/tmp/ds_a.zarr' ds_a.to_zarr(fname_a, mode='w') ds_a_from_disk = xr.open_dataset(fname_a, engine='zarr', chunks={}) create and save the second dataset as a zarrds_b = create_dataset(time_2, ["site_1"]) fname_b = '/tmp/ds_b.zarr' ds_b.to_zarr(fname_b, mode='w') ds_b_from_disk = xr.open_dataset(fname_b, engine='zarr', chunks={}) concatenate the datasetsds = xr.concat([ds_a_from_disk.sel(site="site_1"), ds_b_from_disk.sel(site="site_1")], dim='time') save all data in one chunkencoding = {var: {'chunks': -1} for var in list(ds) + list(ds.coords)} fname = '/tmp/concated.zarr' Uncomment the following line to fix this issueds.load()save the datasetds.to_zarr(fname, mode='w', encoding=encoding) ds_from_disk = xr.open_dataset(fname, engine='zarr') print(ds_from_disk.to_dataframe()) ``` MVCE confirmation
Relevant log outputNo response Anything else we need to know?Example output without Example output with My hunch is that this has to do with a mismatch between Dask chunks in the unloaded dataset and the chunks specified in Environment
INSTALLED VERSIONS
------------------
commit: None
python: 3.9.16 (main, Dec 7 2022, 01:11:51)
[GCC 9.4.0]
python-bits: 64
OS: Linux
OS-release: 5.19.0-35-generic
machine: x86_64
processor: x86_64
byteorder: little
LC_ALL: None
LANG: None
LOCALE: ('en_US', 'UTF-8')
libhdf5: 1.12.0
libnetcdf: None
xarray: 2023.3.0
pandas: 1.4.0
numpy: 1.22.4
scipy: 1.8.0
netCDF4: None
pydap: None
h5netcdf: None
...
pytest: 6.2.2
mypy: None
IPython: 8.3.0
sphinx: None
|
{ "url": "https://api.github.com/repos/pydata/xarray/issues/7672/reactions", "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
13221727 | issue |