html_url,issue_url,id,node_id,user,created_at,updated_at,author_association,body,reactions,performed_via_github_app,issue
https://github.com/pydata/xarray/issues/7790#issuecomment-1532152709,https://api.github.com/repos/pydata/xarray/issues/7790,1532152709,IC_kwDOAMm_X85bUsuF,14983768,2023-05-02T21:07:27Z,2023-05-02T21:09:10Z,NONE,"@kmuehlbauer - genius! Yes. That pull request should fix this issue exactly! And it explains why I see this issue and you don't - with undefined behavior anything can happen. Since we are on different OSes, our systems behave differently.
I just double checked with pandas and this fix will do the right thing:
```python
import pandas as pd
print(pd.to_timedelta([np.nan, 0],""ns"") + np.datetime64('1970-01-01'))
```
```
DatetimeIndex(['NaT', '1970-01-01'], dtype='datetime64[ns]', freq=None)
```
I see that the pull request with the fix has been sitting since December of last year. Is there some way to somehow get someone to look at that pull request who can merge it?","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,1685803922
https://github.com/pydata/xarray/issues/7790#issuecomment-1530347592,https://api.github.com/repos/pydata/xarray/issues/7790,1530347592,IC_kwDOAMm_X85bN0BI,14983768,2023-05-01T21:43:08Z,2023-05-01T21:43:56Z,NONE,"Ah hah! Well, I don't know why this is working for you @kmuehlbauer, but I can see why it is not working for me. I've been debugging through the code and it looks like the problem is the `_decode_datetime_with_pandas` function. For me, it's converting a float NaN into an integer, which results in a zero value.
It all starts in the `open_zarr` function, which by default sets the `use_cftime` parameter to None by default:
https://github.com/pydata/xarray/blob/25d9a28e12141b9b5e4a79454eb76ddd2ee2bc4d/xarray/backends/zarr.py#L701-L817
There's a bunch of stuff that gets called, but eventually we get to the function `decode_cf_datetime`, which ironically (given the name) also takes this `use_cftime` parameter, which is still None. Because `use_cftime` is None, the function calls `_decode_datetime_with_pandas`:
https://github.com/pydata/xarray/blob/25d9a28e12141b9b5e4a79454eb76ddd2ee2bc4d/xarray/coding/times.py#L265-L289
and then, in `_decode_datetime_with_pandas`, the code casts a float NaN value to zero:
https://github.com/pydata/xarray/blob/979b99831f5d34d33120312a15dad3e6a0830f32/xarray/coding/times.py#L216-L262
In line 254, `flat_num_dates` is `array([ nan, 1.6726176e+18])`. After line 254, `flat_nuM-dates_ns_int` is `array([ 0, 1672617600000000000])`.
","{""total_count"": 1, ""+1"": 1, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,1685803922
https://github.com/pydata/xarray/issues/7790#issuecomment-1530186148,https://api.github.com/repos/pydata/xarray/issues/7790,1530186148,IC_kwDOAMm_X85bNMmk,14983768,2023-05-01T20:25:34Z,2023-05-01T20:25:34Z,NONE,"@kmuehlbauer - I ran https://github.com/pydata/xarray/issues/7790#issuecomment-1529894939 and I get an incorrect fill value:
```
******************
Created with fill value 1900-01-01
array([ 'NaT', '2023-01-02T00:00:00.000000000'],
dtype='datetime64[ns]')
Coordinates:
* time (time) datetime64[ns] NaT 2023-01-02
******************
Read back out of the zarr store with xarray
array(['1970-01-01T00:00:00.000000000', '2023-01-02T00:00:00.000000000'],
dtype='datetime64[ns]')
Coordinates:
* time (time) datetime64[ns] 1970-01-01 2023-01-02
{}
{'chunks': (2,), 'preferred_chunks': {'time': 2}, 'compressor': Blosc(cname='lz4', clevel=5, shuffle=SHUFFLE, blocksize=0), 'filters': None, '_FillValue': -2208988800000000000, 'units': 'nanoseconds since 1970-01-01', 'calendar': 'proleptic_gregorian', 'dtype': dtype('int64')}
******************
Read back out of the zarr store with zarr
[-2208988800000000000 1672617600000000000]
```
and here is my show_versions, since it may have changed because I've added some new libraries. It looks like my ipython version is slightly different, but I can't see how that would affect things.
```
INSTALLED VERSIONS
------------------
commit: None
python: 3.11.3 | packaged by conda-forge | (main, Apr 6 2023, 08:58:31) [Clang 14.0.6 ]
python-bits: 64
OS: Darwin
OS-release: 22.4.0
machine: arm64
processor: arm
byteorder: little
LC_ALL: None
LANG: en_US.UTF-8
LOCALE: ('en_US', 'UTF-8')
libhdf5: None
libnetcdf: None
xarray: 2023.4.2
pandas: 2.0.1
numpy: 1.24.3
scipy: None
netCDF4: None
pydap: None
h5netcdf: None
h5py: None
Nio: None
zarr: 2.14.2
cftime: None
nc_time_axis: None
PseudoNetCDF: None
iris: None
bottleneck: None
dask: None
distributed: None
matplotlib: None
cartopy: None
seaborn: None
numbagg: None
fsspec: None
cupy: None
pint: None
sparse: None
flox: None
numpy_groupies: None
setuptools: 67.7.2
pip: 23.1.2
conda: None
pytest: None
mypy: None
IPython: 8.13.1
sphinx: None
```
","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,1685803922
https://github.com/pydata/xarray/issues/7790#issuecomment-1530056660,https://api.github.com/repos/pydata/xarray/issues/7790,1530056660,IC_kwDOAMm_X85bMs_U,14983768,2023-05-01T18:37:47Z,2023-05-01T18:39:21Z,NONE,"Oops! Yes. You are right. I had some cross-wording on the variable names. So I started a new notebook. Unfortunately, I think you may have also gotten some wires crossed? You set the time fill value to 1900-01-01, but then use NaT in the actual array?
Here is a fresh notebook with a stand-alone cell with everything that I think you were doing, but I'm not 100%. The fill value is still wrong when it gets read out, but it is at least different? The fill value is now set to the units for some reason. This seems like progress?
```python
import numpy as np
import xarray as xr
import zarr
# Create a time array with one fill value, NaT
time = np.array([np.datetime64(""NaT"", ""ns""), '2023-01-02 00:00:00.00000000'], dtype='M8[ns]')
# Create xarray with this fill value
xr_time_array = xr.DataArray(data=time,dims=['time'],name='time')
xr_ds = xr.Dataset(dict(time=xr_time_array))
print(""**********************"")
print(""xarray created with NaT fill value"")
print(""----------------------"")
print(xr_ds[""time""])
# Save as zarr
location_with_units = ""xarray_and_units.zarr""
encoding = {
""time"":{""_FillValue"":np.datetime64(""NaT"",""ns""),""dtype"":np.int64,""units"":""nanoseconds since 1970-01-01""}
}
xr_ds.to_zarr(location_with_units,mode=""w"",encoding=encoding)
# Read it back out again
xr_read = xr.open_zarr(location_with_units)
print(""**********************"")
print(""xarray created read with NaT fill value"")
print(""----------------------"")
print(xr_read[""time""])
print(xr_read[""time""].attrs)
print(xr_read[""time""].encoding)
```
```
**********************
xarray created with NaT fill value
----------------------
array([ 'NaT', '2023-01-02T00:00:00.000000000'],
dtype='datetime64[ns]')
Coordinates:
* time (time) datetime64[ns] NaT 2023-01-02
**********************
xarray created read with NaT fill value
----------------------
array(['1970-01-01T00:00:00.000000000', '2023-01-02T00:00:00.000000000'],
dtype='datetime64[ns]')
Coordinates:
* time (time) datetime64[ns] 1970-01-01 2023-01-02
{}
{'chunks': (2,), 'preferred_chunks': {'time': 2}, 'compressor': Blosc(cname='lz4', clevel=5, shuffle=SHUFFLE, blocksize=0), 'filters': None, '_FillValue': -9223372036854775808, 'units': 'nanoseconds since 1970-01-01', 'calendar': 'proleptic_gregorian', 'dtype': dtype('int64')}
```","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,1685803922
https://github.com/pydata/xarray/issues/7790#issuecomment-1527948787,https://api.github.com/repos/pydata/xarray/issues/7790,1527948787,IC_kwDOAMm_X85bEqXz,14983768,2023-04-28T18:39:01Z,2023-04-28T18:39:01Z,NONE,Where in the code is the time array being _decoded_? That seems to be where a lot of the issue is?,"{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,1685803922
https://github.com/pydata/xarray/issues/7790#issuecomment-1527918654,https://api.github.com/repos/pydata/xarray/issues/7790,1527918654,IC_kwDOAMm_X85bEjA-,14983768,2023-04-28T18:08:16Z,2023-04-28T18:08:16Z,NONE,"The zarr store does indeed use an integer in this case according to the .zmetadata file:
```
{
""metadata"": {
"".zattrs"": {},
"".zgroup"": {
""zarr_format"": 2
},
""time/.zarray"": {
""chunks"": [
2
],
""compressor"": {
""blocksize"": 0,
""clevel"": 5,
""cname"": ""lz4"",
""id"": ""blosc"",
""shuffle"": 1
},
""dtype"": ""
array(['1900-01-01T00:00:00.000000000', '2023-01-02T00:00:00.000000000'],
dtype='datetime64[ns]')
Coordinates:
* time (time) datetime64[ns] 1900-01-01 2023-01-02
******************
array(['2023-01-02T00:00:00.000000000', '2023-01-02T00:00:00.000000000'],
dtype='datetime64[ns]')
Coordinates:
* time (time) datetime64[ns] 2023-01-02 2023-01-02
{'chunks': (2,), 'preferred_chunks': {'time': 2}, 'compressor': Blosc(cname='lz4', clevel=5, shuffle=SHUFFLE, blocksize=0), 'filters': None, '_FillValue': -9.223372036854776e+18, 'units': 'days since 2023-01-02 00:00:00', 'calendar': 'proleptic_gregorian', 'dtype': dtype('float64')}
```","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,1685803922
https://github.com/pydata/xarray/issues/7790#issuecomment-1525774670,https://api.github.com/repos/pydata/xarray/issues/7790,1525774670,IC_kwDOAMm_X85a8XlO,14983768,2023-04-27T14:13:58Z,2023-04-27T14:13:58Z,NONE,"Interestingly, xarray is also perfectly happy to read a numpy.datetime64 array out of a zarr store as long as the xarray metadata is present. xarray even helpfully creates an '_FillValue"" attribute for the array so there is no confusion:
```
# Create a zarr store directly with numpy.datetime64 type
location_zarr_direct = ""from_zarr.zarr""
root = zarr.open(location_zarr_direct,mode='w')
z_time_array = root.create_dataset(
""time"",data=time,shape=time.shape,chunks=time.shape,dtype=time.dtype,
fill_value=time_fill_value
)
# Add xarray metadata
z_time_array.attrs[""_ARRAY_DIMENSIONS""] = [""time""]
zarr.convenience.consolidate_metadata(location_zarr_direct)
# Use xarray to read this data out
xr_read_from_zarr = xr.open_zarr(location_zarr_direct)
print(xr_read_from_zarr[""time""])
```
```
array([ 'NaT', '2023-01-02T00:00:00.000000000'],
dtype='datetime64[ns]')
Coordinates:
* time (time) datetime64[ns] NaT 2023-01-02
Attributes:
_FillValue: NaT
```
So I am extremely confused as to why xarray encodes time arrays so strangely when it creates the zarr store itself! (Hence https://github.com/pydata/xarray/discussions/7776)","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,1685803922
https://github.com/pydata/xarray/issues/7790#issuecomment-1525766244,https://api.github.com/repos/pydata/xarray/issues/7790,1525766244,IC_kwDOAMm_X85a8Vhk,14983768,2023-04-27T14:08:37Z,2023-04-27T14:08:37Z,NONE,"Ah! Okay. I did not know about the `.encoding` option, which does indeed have the fill value. Thank you.
Interestingly, -9.223372036854776e+18 is just the float equivalent of numpy.datetime64('NaT'):
```python
float(np.datetime64('NaT').view('i8'))
```
```
-9.223372036854776e+18
```
And I know this isn't an issue with zarr and NaT because I can create the zarr store directly with the zarr library and it's perfectly happy:
```python
# Create a zarr store directly with numpy.datetime64 type
location_zarr_direct = ""from_zarr.zarr""
root = zarr.open(location_zarr_direct,mode='w')
z_time_array = root.create_dataset(
""time"",data=time,shape=time.shape,chunks=time.shape,dtype=time.dtype,
fill_value=time_fill_value
)
zarr.convenience.consolidate_metadata(location_zarr_direct)
# Read it back out again
read_zarr = zarr.open(location_zarr_direct,mode='r')
print(read_zarr[""time""][:])
```
```
[ 'NaT' '2023-01-02T00:00:00.000000000']
```
","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,1685803922