home / github / issue_comments

Menu
  • GraphQL API
  • Search all tables

issue_comments: 1094583214

This data as json

html_url issue_url id node_id user created_at updated_at author_association body reactions performed_via_github_app issue
https://github.com/pydata/xarray/issues/6456#issuecomment-1094583214 https://api.github.com/repos/pydata/xarray/issues/6456 1094583214 IC_kwDOAMm_X85BPgOu 34276374 2022-04-11T06:01:44Z 2022-04-12T08:48:13Z NONE

@max-sixty - I've tried to slim it down below (no loop, and only one save). From the print statements, it's clear that before overwriting the .zarr ds3 is working correctly, but once ds3 is saved it breaks the data corresponding to the initial save (now all NaNs). I am guessing this is due to trying to read from and save over the same data, but I wouldn't have expected it to be a problem if it was loading the chunks into memory during the saving.

``` import pandas as pd import numpy as np import glob import xarray as xr from tqdm import tqdm

Creating pkl files

[pd.DataFrame(np.random.randint(0,10, (1000,500))).astype(object).to_pickle('df{}.pkl'.format(i)) for i in range(4)]

fnames = glob.glob('*.pkl')

df1 = pd.read_pickle(fnames[0]) df1.columns = np.arange(0,500).astype(object) # the real pkl files contain all objects df1.index = np.arange(0,1000).astype(object) df1 = df1.astype(np.float32)

ds = xr.DataArray(df1.values, dims=['fname', 'res_dim'], coords={'fname': df1.index.values, 'res_dim': df1.columns.values}) ds = ds.to_dataset(name='low_dim').chunk({'fname': 500, 'res_dim': 1})

ds.to_zarr('zarr_bug.zarr', mode='w') ds1 = xr.open_zarr('zarr_bug.zarr', decode_coords="all")

df2 = pd.read_pickle(fnames[1]) df2.columns = np.arange(0,500).astype(object) df2.index = np.arange(0,1000).astype(object) df2 = df2.astype(np.float32)

ds2 = xr.DataArray(df2.values, dims=['fname', 'res_dim'], coords={'fname': df2.index.values, 'res_dim': df2.columns.values}) ds2 = ds2.to_dataset(name='low_dim').chunk({'fname': 500, 'res_dim': 1})

ds3 = xr.concat([ds1, ds2], dim='fname') ds3['fname'] = ds3.fname.astype(str)

print(ds3.low_dim.values)

ds3.to_zarr('zarr_bug.zarr', mode='w')

print(ds3.low_dim.values) ```

The output:

[[7. 8. 4. ... 9. 6. 7.] [0. 4. 5. ... 9. 7. 6.] [3. 4. 3. ... 1. 6. 1.] ... [4. 0. 4. ... 5. 6. 9.] [5. 2. 5. ... 1. 7. 1.] [8. 9. 7. ... 4. 4. 1.]] [[nan nan nan ... nan nan nan] [nan nan nan ... nan nan nan] [nan nan nan ... nan nan nan] ... [ 4. 0. 4. ... 5. 6. 9.] [ 5. 2. 5. ... 1. 7. 1.] [ 8. 9. 7. ... 4. 4. 1.]]

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  1197117301
Powered by Datasette · Queries took 0.749ms · About: xarray-datasette