home / github / issues

Menu
  • Search all tables
  • GraphQL API

issues: 681279877

This data as json

id node_id number title user state locked assignee milestone comments created_at updated_at closed_at author_association active_lock_reason draft pull_request body reactions performed_via_github_app state_reason repo type
681279877 MDU6SXNzdWU2ODEyNzk4Nzc= 4347 to_zarr() failing after concatenating netcdfs with different time indexes 23692810 open 0     5 2020-08-18T19:30:30Z 2022-04-28T15:09:10Z   NONE      

What happened: After concatenating two NetCDF DataSets with different cftime.DateTimeNoLeap coordinates, attempting to write to a Zarr store with ds.to_zarr() fails with an OutOfBoundsDatetime exception.

What you expected to happen: I expect to_zarr() to execute successfully.

Minimal Complete Verifiable Example:

```python import xarray as xr import cftime import pandas as pd

open a generic CESM dataset containing a time_bnds variable

url = 'http://adss.apcc21.org/opendap/CMIP5DB/cmip5_daily_BT/pr_day_CESM1-BGC_rcp85_r1i1p1_20760101-21001231.nc' ds = xr.open_dataset(url)

create two new DataSets with different, overlapping time indexes.

ds2 = ds.sel(time=slice(None, cftime.DatetimeNoLeap(2076, 3, 1, 1, 0, 0, 0))) ds3 = ds.sel(time=slice(None, cftime.DatetimeNoLeap(2076, 2, 1, 1, 0, 0, 0)))

concatenate the two DataSets, using the default fillvalue

ds4 = xr.concat([ds2, ds3], dim=pd.Index(['ds2','ds3'], name='ds'))

fails with OutOfBoundsDatetime exception

zs = ds4.to_zarr('/tmp/my_zarr.zarr')

```

Anything else we need to know?: I believe the problem is related to the implicit NaN fillvalue used in concatenating the time_bnds variable. I arrived at the code while trying to produce a minimal example of a similar error, where a DataSet concatenation would fail with a SerializationError if I tried to concatenate multiple datasets containing time_bnds variables. In the above example and in my earlier troubleshooting in production code, removing time_bnds with ds = ds.drop('time_bnds') made the to_zarr() command work.

This is a common use case since time_bnds indexes are generated by CESM climate model output.

Environment:

Output of <tt>xr.show_versions()</tt> INSTALLED VERSIONS ------------------ commit: None python: 3.7.7 (default, Mar 23 2020, 22:36:06) [GCC 7.3.0] python-bits: 64 OS: Linux OS-release: 5.3.0-1032-aws machine: x86_64 processor: x86_64 byteorder: little LC_ALL: None LANG: C.UTF-8 LOCALE: en_US.UTF-8 libhdf5: 1.10.4 libnetcdf: 4.6.1 xarray: 0.16.0 pandas: 1.0.5 numpy: 1.19.1 scipy: 1.5.0 netCDF4: 1.4.2 pydap: None h5netcdf: 0.8.0 h5py: 2.10.0 Nio: None zarr: 2.3.2 cftime: 1.2.1 nc_time_axis: 1.2.0 PseudoNetCDF: None rasterio: None cfgrib: None iris: None bottleneck: None dask: 2.20.0 distributed: 2.20.0 matplotlib: 3.2.2 cartopy: 0.17.0 seaborn: None numbagg: None pint: None setuptools: 49.2.0.post20200714 pip: 20.1.1 conda: None pytest: None IPython: 7.16.1 sphinx: None /home/ubuntu/a
{
    "url": "https://api.github.com/repos/pydata/xarray/issues/4347/reactions",
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
    13221727 issue

Links from other tables

  • 2 rows from issues_id in issues_labels
  • 5 rows from issue in issue_comments
Powered by Datasette · Queries took 75.293ms · About: xarray-datasette