issue_comments: 1094583214

This data as json

html_url	issue_url	id	node_id	user	created_at	updated_at	author_association	body	reactions	performed_via_github_app	issue
https://github.com/pydata/xarray/issues/6456#issuecomment-1094583214	https://api.github.com/repos/pydata/xarray/issues/6456	1094583214	IC_kwDOAMm_X85BPgOu	34276374	2022-04-11T06:01:44Z	2022-04-12T08:48:13Z	NONE	@max-sixty - I've tried to slim it down below (no loop, and only one save). From the print statements, it's clear that before overwriting the .zarr `ds3` is working correctly, but once `ds3` is saved it breaks the data corresponding to the initial save (now all NaNs). I am guessing this is due to trying to read from and save over the same data, but I wouldn't have expected it to be a problem if it was loading the chunks into memory during the saving. ``` import pandas as pd import numpy as np import glob import xarray as xr from tqdm import tqdm Creating pkl files [pd.DataFrame(np.random.randint(0,10, (1000,500))).astype(object).to_pickle('df{}.pkl'.format(i)) for i in range(4)] fnames = glob.glob('*.pkl') df1 = pd.read_pickle(fnames[0]) df1.columns = np.arange(0,500).astype(object) # the real pkl files contain all objects df1.index = np.arange(0,1000).astype(object) df1 = df1.astype(np.float32) ds = xr.DataArray(df1.values, dims=['fname', 'res_dim'], coords={'fname': df1.index.values, 'res_dim': df1.columns.values}) ds = ds.to_dataset(name='low_dim').chunk({'fname': 500, 'res_dim': 1}) ds.to_zarr('zarr_bug.zarr', mode='w') ds1 = xr.open_zarr('zarr_bug.zarr', decode_coords="all") df2 = pd.read_pickle(fnames[1]) df2.columns = np.arange(0,500).astype(object) df2.index = np.arange(0,1000).astype(object) df2 = df2.astype(np.float32) ds2 = xr.DataArray(df2.values, dims=['fname', 'res_dim'], coords={'fname': df2.index.values, 'res_dim': df2.columns.values}) ds2 = ds2.to_dataset(name='low_dim').chunk({'fname': 500, 'res_dim': 1}) ds3 = xr.concat([ds1, ds2], dim='fname') ds3['fname'] = ds3.fname.astype(str) print(ds3.low_dim.values) ds3.to_zarr('zarr_bug.zarr', mode='w') print(ds3.low_dim.values) ``` The output: `[[7. 8. 4. ... 9. 6. 7.] [0. 4. 5. ... 9. 7. 6.] [3. 4. 3. ... 1. 6. 1.] ... [4. 0. 4. ... 5. 6. 9.] [5. 2. 5. ... 1. 7. 1.] [8. 9. 7. ... 4. 4. 1.]] [[nan nan nan ... nan nan nan] [nan nan nan ... nan nan nan] [nan nan nan ... nan nan nan] ... [ 4. 0. 4. ... 5. 6. 9.] [ 5. 2. 5. ... 1. 7. 1.] [ 8. 9. 7. ... 4. 4. 1.]]`	{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }		1197117301

issue_comments: 1094583214

Creating pkl files