home / github / issues

Menu
  • Search all tables
  • GraphQL API

issues: 1050082137

This data as json

id node_id number title user state locked assignee milestone comments created_at updated_at closed_at author_association active_lock_reason draft pull_request body reactions performed_via_github_app state_reason repo type
1050082137 I_kwDOAMm_X84-lvtZ 5969 `to_zarr(append_dim="time")` appends incorrect datetimes 460756 closed 0     3 2021-11-10T17:00:53Z 2024-05-03T17:09:31Z 2024-05-03T17:09:30Z NONE      

Description

If you create a Zarr with a single timestep and then append to the time dimension of that Zarr in subsequent writes then the appended timestamps are likely to be wrong. This only seems to happen if the time dimension is datetime64.

Minimal Complete Verifiable Example

Create a really simple Dataset:

python times = pd.date_range("2000-01-01 00:35", periods=8, freq="6H") da = xr.DataArray(coords=[times], dims=["time"]) ds = da.to_dataset(name="foo")

Write just the first timestep to a new Zarr store:

python ZARR_PATH = "test.zarr" ds.isel(time=[0]).to_zarr(ZARR_PATH, mode="w")

So far, so good!

Now things get weird... let's append the remainder of ds to the Zarr store:

python ds.isel(time=slice(1, None)).to_zarr(ZARR_PATH, append_dim="time")

This throws a warning, which is probably relevant:

/home/jack/miniconda3/envs/nowcasting_dataset/lib/python3.9/site-packages/xarray/core/dataset.py:2037: SerializationWarning: saving variable None with floating point data as an integer dtype without any _FillValue to use for NaNs return to_zarr(

What happened

Let's load the Zarr and print the contents on the time coord:

python ds_loaded = xr.open_dataset(ZARR_PATH, engine="zarr") print(ds_loaded.time) <xarray.DataArray 'time' (time: 8)> array(['2000-01-01T00:35', '2000-01-01T00:35', '2000-01-01T00:35', '2000-01-02T00:35', '2000-01-02T00:35', '2000-01-02T00:35', '2000-01-03T00:35', '2000-01-03T00:35'], dtype='datetime64[ns]') Coordinates: * time (time) datetime64[ns] 2000-01-01T00:35:00 ... 2000-01-03T00:35:00

(I've removed the seconds and milliseconds to make it a bit easier to read)

The first and fifth time coords (2000-01-01T00:35 and 2000-01-02T00:35) are correct. None of the others are correct!

The encoding is not appropriate (see #3942)... notice that the units is days since..., which clearly can't represent sub-day resolution:

python print(ds_loaded.time.encoding) {'chunks': (1,), 'preferred_chunks': {'time': 1}, 'compressor': Blosc(cname='lz4', clevel=5, shuffle=SHUFFLE, blocksize=0), 'filters': None, 'units': 'days since 2000-01-01 00:35:00', 'calendar': 'proleptic_gregorian', 'dtype': dtype('int64')}

What you expected to happen

The correct time coords are: python print(ds.time) <xarray.DataArray 'time' (time: 8)> array(['2000-01-01T00:35', '2000-01-01T06:35', '2000-01-01T12:35', '2000-01-01T18:35', '2000-01-02T00:35', '2000-01-02T06:35', '2000-01-02T12:35', '2000-01-02T18:35'], dtype='datetime64[ns]') Coordinates: * time (time) datetime64[ns] 2000-01-01T00:35:00 ... 2000-01-02T18:35:00

Anything else we need to know?

There are three workarounds that I'm aware of:

1) When first creating the Zarr, write two or more timesteps into the Zarr. Then you can append any number of timesteps to the Zarr and everything works fine. 2) Convert the time coords to Unix epoch, represented as ints. 3) Manually set the encoding before the first write (as suggested in https://github.com/pydata/xarray/issues/3942#issuecomment-610444090). For example:

python ds.isel(time=[0]).to_zarr( ZARR_PATH, mode="w", encoding={ 'time': { 'units': 'seconds since 1970-01-01' } } )

Related issues

It's possible that the root cause of this issue is #3942.

And I think #3379 is another symptom of this issue.

Environment

Output of <tt>xr.show_versions()</tt> INSTALLED VERSIONS ------------------ commit: None python: 3.9.7 | packaged by conda-forge | (default, Sep 29 2021, 19:20:46) [GCC 9.4.0] python-bits: 64 OS: Linux OS-release: 5.13.0-21-generic machine: x86_64 processor: x86_64 byteorder: little LC_ALL: None LANG: en_GB.UTF-8 LOCALE: ('en_GB', 'UTF-8') libhdf5: 1.12.1 libnetcdf: 4.8.1 xarray: 0.20.1 pandas: 1.3.4 numpy: 1.21.4 scipy: 1.7.2 netCDF4: 1.5.8 pydap: None h5netcdf: 0.11.0 h5py: 3.4.0 Nio: None zarr: 2.10.1 cftime: 1.5.1.1 nc_time_axis: None PseudoNetCDF: None rasterio: 1.2.8 cfgrib: 0.9.9.1 iris: None bottleneck: 1.3.2 dask: 2021.10.0 distributed: None matplotlib: 3.4.3 cartopy: None seaborn: None numbagg: None fsspec: 2021.11.0 cupy: None pint: None sparse: None setuptools: 58.5.3 pip: 21.3.1 conda: None pytest: 6.2.5 IPython: 7.29.0 sphinx: None
{
    "url": "https://api.github.com/repos/pydata/xarray/issues/5969/reactions",
    "total_count": 1,
    "+1": 1,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  completed 13221727 issue

Links from other tables

  • 3 rows from issues_id in issues_labels
  • 0 rows from issue in issue_comments
Powered by Datasette · Queries took 2.625ms · About: xarray-datasette