id,node_id,number,title,user,state,locked,assignee,milestone,comments,created_at,updated_at,closed_at,author_association,active_lock_reason,draft,pull_request,body,reactions,performed_via_github_app,state_reason,repo,type 1050082137,I_kwDOAMm_X84-lvtZ,5969,"`to_zarr(append_dim=""time"")` appends incorrect datetimes",460756,closed,0,,,3,2021-11-10T17:00:53Z,2024-05-03T17:09:31Z,2024-05-03T17:09:30Z,NONE,,,,"### Description If you create a Zarr with a single timestep and then append to the `time` dimension of that Zarr in subsequent writes then the appended timestamps are likely to be wrong. This only seems to happen if the `time` dimension is `datetime64`. ### Minimal Complete Verifiable Example Create a really simple `Dataset`: ```python times = pd.date_range(""2000-01-01 00:35"", periods=8, freq=""6H"") da = xr.DataArray(coords=[times], dims=[""time""]) ds = da.to_dataset(name=""foo"") ``` Write just the first timestep to a new Zarr store: ```python ZARR_PATH = ""test.zarr"" ds.isel(time=[0]).to_zarr(ZARR_PATH, mode=""w"") ``` So far, so good! Now things get weird... let's append the remainder of `ds` to the Zarr store: ```python ds.isel(time=slice(1, None)).to_zarr(ZARR_PATH, append_dim=""time"") ``` This throws a warning, which is probably relevant: ``` /home/jack/miniconda3/envs/nowcasting_dataset/lib/python3.9/site-packages/xarray/core/dataset.py:2037: SerializationWarning: saving variable None with floating point data as an integer dtype without any _FillValue to use for NaNs return to_zarr( ``` ### What happened Let's load the Zarr and print the contents on the `time` coord: ```python ds_loaded = xr.open_dataset(ZARR_PATH, engine=""zarr"") print(ds_loaded.time) ``` ``` array(['2000-01-01T00:35', '2000-01-01T00:35', '2000-01-01T00:35', '2000-01-02T00:35', '2000-01-02T00:35', '2000-01-02T00:35', '2000-01-03T00:35', '2000-01-03T00:35'], dtype='datetime64[ns]') Coordinates: * time (time) datetime64[ns] 2000-01-01T00:35:00 ... 2000-01-03T00:35:00 ``` (I've removed the seconds and milliseconds to make it a bit easier to read) The first and fifth time coords (2000-01-01T00:35 and 2000-01-02T00:35) are correct. None of the others are correct! The encoding is not appropriate (see #3942)... notice that the `units` is `days since...`, which clearly can't represent sub-day resolution: ```python print(ds_loaded.time.encoding) ``` ``` {'chunks': (1,), 'preferred_chunks': {'time': 1}, 'compressor': Blosc(cname='lz4', clevel=5, shuffle=SHUFFLE, blocksize=0), 'filters': None, 'units': 'days since 2000-01-01 00:35:00', 'calendar': 'proleptic_gregorian', 'dtype': dtype('int64')} ``` ### What you expected to happen The correct `time` coords are: ```python print(ds.time) ``` ``` array(['2000-01-01T00:35', '2000-01-01T06:35', '2000-01-01T12:35', '2000-01-01T18:35', '2000-01-02T00:35', '2000-01-02T06:35', '2000-01-02T12:35', '2000-01-02T18:35'], dtype='datetime64[ns]') Coordinates: * time (time) datetime64[ns] 2000-01-01T00:35:00 ... 2000-01-02T18:35:00 ``` ### Anything else we need to know? There are three workarounds that I'm aware of: 1) When first creating the Zarr, write two or more timesteps into the Zarr. Then you can append any number of timesteps to the Zarr and everything works fine. 2) Convert the `time` coords to Unix epoch, represented as ints. 3) Manually set the encoding before the first write (as suggested in https://github.com/pydata/xarray/issues/3942#issuecomment-610444090). For example: ```python ds.isel(time=[0]).to_zarr( ZARR_PATH, mode=""w"", encoding={ 'time': { 'units': 'seconds since 1970-01-01' } } ) ``` ### Related issues It's possible that the root cause of this issue is #3942. And I think #3379 is another symptom of this issue. ### Environment
Output of xr.show_versions() INSTALLED VERSIONS ------------------ commit: None python: 3.9.7 | packaged by conda-forge | (default, Sep 29 2021, 19:20:46) [GCC 9.4.0] python-bits: 64 OS: Linux OS-release: 5.13.0-21-generic machine: x86_64 processor: x86_64 byteorder: little LC_ALL: None LANG: en_GB.UTF-8 LOCALE: ('en_GB', 'UTF-8') libhdf5: 1.12.1 libnetcdf: 4.8.1 xarray: 0.20.1 pandas: 1.3.4 numpy: 1.21.4 scipy: 1.7.2 netCDF4: 1.5.8 pydap: None h5netcdf: 0.11.0 h5py: 3.4.0 Nio: None zarr: 2.10.1 cftime: 1.5.1.1 nc_time_axis: None PseudoNetCDF: None rasterio: 1.2.8 cfgrib: 0.9.9.1 iris: None bottleneck: 1.3.2 dask: 2021.10.0 distributed: None matplotlib: 3.4.3 cartopy: None seaborn: None numbagg: None fsspec: 2021.11.0 cupy: None pint: None sparse: None setuptools: 58.5.3 pip: 21.3.1 conda: None pytest: 6.2.5 IPython: 7.29.0 sphinx: None
","{""url"": ""https://api.github.com/repos/pydata/xarray/issues/5969/reactions"", ""total_count"": 1, ""+1"": 1, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,completed,13221727,issue