home / github / issues

Menu
  • Search all tables
  • GraphQL API

issues: 1474785646

This data as json

id node_id number title user state locked assignee milestone comments created_at updated_at closed_at author_association active_lock_reason draft pull_request body reactions performed_via_github_app state_reason repo type
1474785646 I_kwDOAMm_X85X53Fu 7354 'open_mfdataset' zarr zip timestamp issue 34686298 open 0     5 2022-12-04T12:45:12Z 2022-12-16T00:29:54Z   NONE      

What happened?

We have been collecting Satellite data and we save each image as one {time}.zarr.zip file. We then collate the images using xr.open_mfdataset and same them to large.zarr.zip file. When loading this file the timestamps are all the same.

This bug did not appear in 2022.3.0 but it did in 2022.6.0

I tried to keep this as minimum as possible, but its a bit of a long example. Hopefully the comments help.

Sorry if this has already been reported, but I could not find it in the issue list

What did you expect to happen?

Expected the time stamps to reflect the data that went in

Minimal Complete Verifiable Example

```Python import pandas as pd import xarray as xr import numpy as np from datetime import datetime, timedelta import zarr import os import glob

ids and times

path = "tmp.zarr.zip" ids = np.array(range(0, 10)) times = [datetime(2022, 9, 1) + timedelta(minutes=60 * i) for i in range(0, 10)]

make 10 random zipp files

for time in times: dataset = xr.DataArray( np.random.uniform(size=(1, len(ids))), coords=(("time", [time]), ("id", ids)), name="data", ).to_dataset(name="data")

file_name = f"tmp_dir/{time.isoformat()}.zarr.zip"

if os.path.exists(file_name):
    os.remove(file_name)
with zarr.ZipStore(file_name) as store:
    dataset.to_zarr(store)

load them all together

files = list(glob.glob(f"tmp_dir/*.zarr.zip")) dataset = xr.open_mfdataset(files, engine="zarr").sortby("time")

this is fine!

assert pd.to_datetime(dataset.time.values[0]) == times[0] assert pd.to_datetime(dataset.time.values[1]) == times[1]

save to file

if os.path.exists(path): os.remove(path) with zarr.ZipStore(path) as store: dataset.to_zarr(store)

read the file

dataset_read = xr.open_dataset(path, engine="zarr") print(dataset_read)

this casues an error

assert pd.to_datetime(dataset_read.time.values[0]) == times[0] assert pd.to_datetime(dataset_read.time.values[1]) == times[1] ```

MVCE confirmation

  • [X] Minimal example — the example is as focused as reasonably possible to demonstrate the underlying issue in xarray.
  • [X] Complete example — the example is self-contained, including all data and the text of any traceback.
  • [X] Verifiable example — the example copy & pastes into an IPython prompt or Binder notebook, returning the result.
  • [X] New issue — a search of GitHub Issues suggests this is not a duplicate.

Relevant log output

Python /Users/peterdudfield/Documents/Github/nwp/venv/lib/python3.8/site-packages/xarray/core/dataset.py:2060: SerializationWarning: saving variable None with floating point data as an integer dtype without any _FillValue to use for NaNs return to_zarr( # type: ignore <xarray.Dataset> Dimensions: (time: 10, id: 10) Coordinates: * id (id) int64 0 1 2 3 4 5 6 7 8 9 * time (time) datetime64[ns] 2022-09-01 2022-09-01 ... 2022-09-01 Data variables: data (time, id) float64 ... Traceback (most recent call last): File "/Users/peterdudfield/Documents/Github/nwp/venv/lib/python3.8/site-packages/IPython/core/interactiveshell.py", line 3251, in run_code exec(code_obj, self.user_global_ns, self.user_ns) File "<ipython-input-16-45f86e8a5977>", line 36, in <module> assert pd.to_datetime(dataset_read.time.values[1]) == times[1] AssertionError

Anything else we need to know?

No response

Environment

INSTALLED VERSIONS ------------------ commit: None python: 3.8.2 (default, Jun 8 2021, 11:59:35) [Clang 12.0.5 (clang-1205.0.22.11)] python-bits: 64 OS: Darwin OS-release: 20.4.0 machine: x86_64 processor: i386 byteorder: little LC_ALL: None LANG: None LOCALE: ('en_GB', 'UTF-8') libhdf5: 1.12.1 libnetcdf: 4.7.4 xarray: 2022.6.0 pandas: 1.4.2 numpy: 1.22.0 scipy: 1.7.3 netCDF4: 1.5.8 pydap: None h5netcdf: 0.13.1 h5py: 3.6.0 Nio: None zarr: 2.10.3 cftime: 1.6.0 nc_time_axis: None PseudoNetCDF: None rasterio: 1.2.10 cfgrib: 0.9.9.1 iris: None bottleneck: 1.3.4 dask: 2022.01.0 distributed: None matplotlib: 3.5.1 cartopy: None seaborn: None numbagg: None fsspec: 2022.11.0 cupy: None pint: None sparse: None flox: None numpy_groupies: None setuptools: 57.0.0 pip: 21.1.2 conda: None pytest: 6.2.5 IPython: 8.0.1 sphinx: None
{
    "url": "https://api.github.com/repos/pydata/xarray/issues/7354/reactions",
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
    13221727 issue

Links from other tables

  • 3 rows from issues_id in issues_labels
  • 5 rows from issue in issue_comments
Powered by Datasette · Queries took 1.143ms · About: xarray-datasette