home / github / issues

Menu
  • Search all tables
  • GraphQL API

issues: 1149364539

This data as json

id node_id number title user state locked assignee milestone comments created_at updated_at closed_at author_association active_lock_reason draft pull_request body reactions performed_via_github_app state_reason repo type
1149364539 I_kwDOAMm_X85Egek7 6300 Lazy saving to NetCDF4 fails randomly if an array is used multiple times 3170788 open 0     1 2022-02-24T14:35:48Z 2022-10-12T07:03:09Z   NONE      

What happened?

Saving xr.Dataset() lazily to NetCDF4 (dset.to_netcdf(..., compute=False)) fails seemingly randomly if an array is used either as a coordinate to multiple variables, or saved with different names as standalone variable. The trace I get is shown below in the log section.

What did you expect to happen?

The saving should work consistently between different runs.

Minimal Complete Verifiable Example

```Python

!/usr/bin/env python

import datetime as dt

import numpy as np import dask.array as da import xarray as xr

COMPUTE = False FNAME = "xr_test.nc"

def main(): y = np.arange(1000, dtype=np.uint16) x = np.arange(2000, dtype=np.uint16)

# Create a time array that is used as a Y-coordinate for the data
now = dt.datetime.utcnow()
time_arr = np.array([now + dt.timedelta(seconds=i) for i in range(y.size)], dtype=np.datetime64)
times = xr.DataArray(time_arr, coords={'y': y})

# Write root
root = xr.Dataset({}, attrs={'global': 'attribute'})
written = [root.to_netcdf(FNAME, mode='w')]

# Write first dataset
data1 = xr.DataArray(da.random.random((y.size, x.size)), dims=['y', 'x'],
                     coords={'y': y, 'x': x, 'time': times})
dset1 = xr.Dataset({'data1': data1})
written.append(dset1.to_netcdf(FNAME, mode='a', compute=COMPUTE))

# Write second dataset using the same time coordinates
data2 = xr.DataArray(da.random.random((y.size, x.size)), dims=['y', 'x'],
                     coords={'y': y, 'x': x, 'time': times})
dset2 = xr.Dataset({'data2': data2})
written.append(dset2.to_netcdf(FNAME, mode='a', compute=COMPUTE))

if not COMPUTE:
    da.compute(written)

if name == "main": main() ```

Relevant log output

Python Traceback (most recent call last): File "/home/lahtinep/bin/test_lazy_netcdf_saving.py", line 43, in <module> main() File "/home/lahtinep/bin/test_lazy_netcdf_saving.py", line 39, in main da.compute(written) File "/home/lahtinep/mambaforge/envs/pytroll/lib/python3.9/site-packages/dask/base.py", line 571, in compute results = schedule(dsk, keys, **kwargs) File "/home/lahtinep/mambaforge/envs/pytroll/lib/python3.9/site-packages/dask/threaded.py", line 79, in get results = get_async( File "/home/lahtinep/mambaforge/envs/pytroll/lib/python3.9/site-packages/dask/local.py", line 507, in get_async raise_exception(exc, tb) File "/home/lahtinep/mambaforge/envs/pytroll/lib/python3.9/site-packages/dask/local.py", line 315, in reraise raise exc File "/home/lahtinep/mambaforge/envs/pytroll/lib/python3.9/site-packages/dask/local.py", line 220, in execute_task result = _execute_task(task, data) File "/home/lahtinep/mambaforge/envs/pytroll/lib/python3.9/site-packages/dask/core.py", line 119, in _execute_task return func(*(_execute_task(a, cache) for a in args)) File "/home/lahtinep/mambaforge/envs/pytroll/lib/python3.9/site-packages/dask/array/core.py", line 4099, in store_chunk return load_store_chunk(x, out, index, lock, return_stored, False) File "/home/lahtinep/mambaforge/envs/pytroll/lib/python3.9/site-packages/dask/array/core.py", line 4086, in load_store_chunk out[index] = x File "/home/lahtinep/mambaforge/envs/pytroll/lib/python3.9/site-packages/xarray/backends/netCDF4_.py", line 69, in __setitem__ data[key] = value File "src/netCDF4/_netCDF4.pyx", line 4903, in netCDF4._netCDF4.Variable.__setitem__ File "src/netCDF4/_netCDF4.pyx", line 4073, in netCDF4._netCDF4.Variable.shape.__get__ File "src/netCDF4/_netCDF4.pyx", line 3462, in netCDF4._netCDF4.Dimension.__len__ File "src/netCDF4/_netCDF4.pyx", line 1927, in netCDF4._netCDF4._ensure_nc_success RuntimeError: NetCDF: Not a valid ID

Anything else we need to know?

The above script fails randomly, thus it should be run several times. Out of ten runs I got the trace twice. If COMPUTE = True, the script works every time (after ~100 tries, at least).

The same behaviour is seen if the time coordinates are removed completely and data1 is used also in dset2 in place of data2.

Environment

INSTALLED VERSIONS

commit: None python: 3.9.9 | packaged by conda-forge | (main, Dec 20 2021, 02:41:03) [GCC 9.4.0] python-bits: 64 OS: Linux OS-release: 5.13.0-30-generic machine: x86_64 processor: x86_64 byteorder: little LC_ALL: None LANG: en_US.UTF-8 LOCALE: ('en_US', 'UTF-8') libhdf5: 1.12.1 libnetcdf: 4.8.1

xarray: 0.20.2 pandas: 1.3.5 numpy: 1.22.0 scipy: 1.7.3 netCDF4: 1.5.8 pydap: None h5netcdf: 0.13.0 h5py: 3.6.0 Nio: None zarr: 2.10.3 cftime: 1.5.1.1 nc_time_axis: None PseudoNetCDF: None rasterio: 1.2.10 cfgrib: None iris: None bottleneck: None dask: 2022.01.0 distributed: 2022.01.0 matplotlib: 3.5.1 cartopy: 0.20.2 seaborn: 0.11.2 numbagg: None fsspec: 2022.01.0 cupy: None pint: None sparse: None setuptools: 59.8.0 pip: 21.3.1 conda: None pytest: 6.2.5 IPython: 8.0.0 sphinx: 4.3.2

{
    "url": "https://api.github.com/repos/pydata/xarray/issues/6300/reactions",
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
    13221727 issue

Links from other tables

  • 2 rows from issues_id in issues_labels
  • 1 row from issue in issue_comments
Powered by Datasette · Queries took 1.404ms · About: xarray-datasette