home / github / issue_comments

Menu
  • GraphQL API
  • Search all tables

issue_comments: 1529894939

This data as json

html_url issue_url id node_id user created_at updated_at author_association body reactions performed_via_github_app issue
https://github.com/pydata/xarray/issues/7790#issuecomment-1529894939 https://api.github.com/repos/pydata/xarray/issues/7790 1529894939 IC_kwDOAMm_X85bMFgb 5821660 2023-05-01T16:05:19Z 2023-05-01T16:05:19Z MEMBER

So, after some debugging I think I've found two issues here with the current code.

First, we need to give the fillvalue with a fitting resolution. Second, we have an issue with inferring the units from the data (if not given).

Here is some workaround code which (finally, :crossed_fingers:) should at least write and read correct data (added comments below):

```python

Create a numpy array of type np.datetime64 with one fill value and one date

FIRST ISSUE WITH _FillValue

we need to provide ns resolution here too, otherwise we get wrong fillvalues (day-reference)

time_fill_value = np.datetime64("1900-01-01 00:00:00.00000000", "ns") time = np.array([np.datetime64("NaT", "ns"), '2023-01-02 00:00:00.00000000'], dtype='M8[ns]')

Create a dataset with this one array

xr_time_array = xr.DataArray(data=time,dims=['time'],name='time') xr_ds = xr.Dataset(dict(time=xr_time_array))

print("******") print("Created with fill value 1900-01-01") print(xr_ds["time"])

Save the dataset to zarr

location_new_fill = "from_xarray_new_fill.zarr"

SECOND ISSUE with inferring units from data

We need to specify "dtype" and "units" which fit our data

Note: as we provide a _FillValue with a reference to unix-epoch

we need to provide a fitting units too

encoding = { "time":{"_FillValue":time_fill_value, "dtype":np.int64, "units":"nanoseconds since 1970-01-01"} } xr_ds.to_zarr(location_new_fill, mode="w", encoding=encoding)

xr_read = xr.open_zarr(location_new_fill) print("******") print("Read back out of the zarr store with xarray") print(xr_read["time"]) print(xr_read["time"].attrs) print(xr_read["time"].encoding)

z_new_fill = zarr.open('from_xarray_new_fill.zarr','r', ) print("******") print("Read back out of the zarr store with zarr")

print(z_new_fill["time"]) print(z_new_fill["time"].attrs) print(z_new_fill["time"][:]) ```

```python


Created with fill value 1900-01-01 <xarray.DataArray 'time' (time: 2)> array([ 'NaT', '2023-01-02T00:00:00.000000000'], dtype='datetime64[ns]') Coordinates: * time (time) datetime64[ns] NaT 2023-01-02


Read back out of the zarr store with xarray <xarray.DataArray 'time' (time: 2)> array([ 'NaT', '2023-01-02T00:00:00.000000000'], dtype='datetime64[ns]') Coordinates: * time (time) datetime64[ns] NaT 2023-01-02 {} {'chunks': (2,), 'preferred_chunks': {'time': 2}, 'compressor': Blosc(cname='lz4', clevel=5, shuffle=SHUFFLE, blocksize=0), 'filters': None, '_FillValue': -2208988800000000000, 'units': 'nanoseconds since 1970-01-01', 'calendar': 'proleptic_gregorian', 'dtype': dtype('int64')}


Read back out of the zarr store with zarr <zarr.core.Array '/time' (2,) int64 read-only> <zarr.attrs.Attributes object at 0x7f086ab8e710> [-2208988800000000000 1672617600000000000] ```

@christine-e-smit Please let me know, if the above workaround gives you correct results in your workflow. If so, then we can think about how to automatically align fillvalue-resolution with data-resolution and what needs to be done to correctly deduce the units.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  1685803922
Powered by Datasette · Queries took 0.971ms · About: xarray-datasette