home / github / issues

Menu
  • Search all tables
  • GraphQL API

issues: 595492608

This data as json

id node_id number title user state locked assignee milestone comments created_at updated_at closed_at author_association active_lock_reason draft pull_request body reactions performed_via_github_app state_reason repo type
595492608 MDU6SXNzdWU1OTU0OTI2MDg= 3942 Time dtype encoding defaulting to `int64` when writing netcdf or zarr 7799184 open 0     8 2020-04-06T23:36:37Z 2021-11-11T12:32:06Z   CONTRIBUTOR      

Time dtype encoding defaults to "int64" for datasets with only zero-hour times when writing to netcdf or zarr.

This results in these datasets having a precision constrained by how the time units are defined (in the example below daily precision, given units are defined as 'days since ...'). If we for instance create a zarr dataset using this default encoding option with such datasets, and subsequently append some non-zero times onto it, we loose the hour/minute/sec information from the appended bits.

MCVE Code Sample

```python In [1]: ds = xr.DataArray( ...: data=[0.5], ...: coords={"time": [datetime.datetime(2012,1,1)]}, ...: dims=("time",), ...: name="x", ...: ).to_dataset()

In [2]: ds
Out[2]: <xarray.Dataset> Dimensions: (time: 1) Coordinates: * time (time) datetime64[ns] 2012-01-01 Data variables: x (time) float64 0.5

In [3]: ds.to_zarr("/tmp/x.zarr")

In [4]: ds1 = xr.open_zarr("/tmp/x.zarr")

In [5]: ds1.time.encoding
Out[5]: {'chunks': (1,), 'compressor': Blosc(cname='lz4', clevel=5, shuffle=SHUFFLE, blocksize=0), 'filters': None, 'units': 'days since 2012-01-01 00:00:00', 'calendar': 'proleptic_gregorian', 'dtype': dtype('int64')}

In [6]: dsnew = xr.DataArray( ...: data=[1.5], ...: coords={"time": [datetime.datetime(2012,1,1,3,0,0)]}, ...: dims=("time",), ...: name="x", ...: ).to_dataset()

In [7]: dsnew.to_zarr("/tmp/x.zarr", append_dim="time")

In [8]: ds1 = xr.open_zarr("/tmp/x.zarr")

In [9]: ds1.time.values
Out[9]: array(['2012-01-01T00:00:00.000000000', '2012-01-01T00:00:00.000000000'], dtype='datetime64[ns]')

```

Expected Output

In [9]: ds1.time.values Out[9]: array(['2012-01-01T00:00:00.000000000', '2012-01-01T03:00:00.000000000'], dtype='datetime64[ns]')

Problem Description

Perhaps it would be useful defaulting time dtype to "float64". Another option could be using a finer time resolution by default than that automatically defined from xarray based on the dataset times (for instance, if the units are automatically defined as "days since ...", use "seconds since...".

```

Versions

Output of `xr.show_versions()` In [10]: xr.show_versions() INSTALLED VERSIONS ------------------ commit: None python: 3.7.5 (default, Nov 20 2019, 09:21:52) [GCC 9.2.1 20191008] python-bits: 64 OS: Linux OS-release: 5.3.0-45-generic machine: x86_64 processor: x86_64 byteorder: little LC_ALL: None LANG: en_NZ.UTF-8 LOCALE: en_NZ.UTF-8 libhdf5: 1.10.4 libnetcdf: 4.6.3 xarray: 0.15.0 pandas: 1.0.1 numpy: 1.18.1 scipy: 1.4.1 netCDF4: 1.5.3 pydap: None h5netcdf: 0.8.0 h5py: 2.10.0 Nio: None zarr: 2.4.0 cftime: 1.1.0 nc_time_axis: None PseudoNetCDF: None rasterio: 1.1.3 cfgrib: None iris: None bottleneck: None dask: 2.14.0 distributed: 2.12.0 matplotlib: 3.2.0 cartopy: 0.17.0 seaborn: None numbagg: None setuptools: 45.3.0 pip: 20.0.2 conda: None pytest: 5.3.5 IPython: 7.13.0 sphinx: None
{
    "url": "https://api.github.com/repos/pydata/xarray/issues/3942/reactions",
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
    13221727 issue

Links from other tables

  • 2 rows from issues_id in issues_labels
  • 8 rows from issue in issue_comments
Powered by Datasette · Queries took 5.027ms · About: xarray-datasette