home / github / issues

Menu
  • GraphQL API
  • Search all tables

issues: 1525802030

This data as json

id node_id number title user state locked assignee milestone comments created_at updated_at closed_at author_association active_lock_reason draft pull_request body reactions performed_via_github_app state_reason repo type
1525802030 I_kwDOAMm_X85a8eQu 7430 Missing Blocks when loading zarr file 12512930 open 0     3 2023-01-09T15:19:50Z 2023-01-13T02:24:24Z   NONE      

What happened?

Under load blocks of Zarr objects go missing. This happens on our minio server (see example) and on the hpc file system. This happens under load, when the filesystem gets slow, so I guess there must be a timeout somewhere.

What did you expect to happen?

A complete map.

Incomplete map:

Complete map, when the filesystem is not under load:

Minimal Complete Verifiable Example

```Python

Calculate global temperature trends

from dask.distributed import Client import xarray as xr from scipy import stats from datetime import datetime import matplotlib.pyplot as plt

def slope(y): x = list(range(0, len(y))) l = stats.linregress(x, y) return l.slope

def main(): print(datetime.now(), "startup", flush = True) print(datetime.now(), "starting dask workers", flush = True) Client(n_workers=1, threads_per_worker=32, memory_limit='64GB')

print(datetime.now(), "opening esdc", flush = True)
c = xr.open_zarr("http://data.rsc4earth.de:9000/earthsystemdatacube/v3.0.1/esdc-8d-0.25deg-256x128x128-3.0.1.zarr/")

print(datetime.now(), "getting air teperature data", flush = True)
ct = c.air_temperature_2m

print(datetime.now(), "setup calculations", flush = True)
cs = xr.apply_ufunc( 
        slope, 
        ct, 
        input_core_dims=[['time']], 
        vectorize=True, 
        dask='parallelized', 
        dask_gufunc_kwargs=dict(allow_rechunk=True))

print(datetime.now(), "saving data", flush = True)
csset = xr.Dataset(dict(tslope = cs))
csset.to_zarr(store="temp_slopes.zarr", mode="w")
print(datetime.now(), "plotting", flush = True)
cssetcalc = xr.open_zarr("temp_slopes.zarr")
cssetcalc.tslope.plot()
plt.savefig("temperature_trends.png")
print(datetime.now(), "done", flush = True)

if name == 'main': main() ```

MVCE confirmation

  • [X] Minimal example — the example is as focused as reasonably possible to demonstrate the underlying issue in xarray.
  • [X] Complete example — the example is self-contained, including all data and the text of any traceback.
  • [X] Verifiable example — the example copy & pastes into an IPython prompt or Binder notebook, returning the result.
  • [X] New issue — a search of GitHub Issues suggests this is not a duplicate.

Relevant log output

No response

Anything else we need to know?

This only seems to happen under load, so you will need to stress the server a bit to reproduce it.

Environment

INSTALLED VERSIONS ------------------ commit: None python: 3.8.15 (default, Nov 24 2022, 15:19:38) [GCC 11.2.0] python-bits: 64 OS: Linux OS-release: 4.18.0-372.26.1.el8_6.x86_64 machine: x86_64 processor: x86_64 byteorder: little LC_ALL: None LANG: en_US.UTF-8 LOCALE: ('en_US', 'UTF-8') libhdf5: None libnetcdf: None xarray: 2022.11.0 pandas: 1.5.2 numpy: 1.23.5 scipy: 1.9.3 netCDF4: None pydap: None h5netcdf: None h5py: None Nio: None zarr: 2.13.3 cftime: None nc_time_axis: None PseudoNetCDF: None rasterio: None cfgrib: None iris: None bottleneck: 1.3.5 dask: 2022.02.1 distributed: 2022.2.1 matplotlib: 3.6.2 cartopy: None seaborn: None numbagg: None fsspec: 2022.11.0 cupy: None pint: None sparse: None flox: None numpy_groupies: None setuptools: 65.5.0 pip: 22.3.1 conda: None pytest: None IPython: None sphinx: None
{
    "url": "https://api.github.com/repos/pydata/xarray/issues/7430/reactions",
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
    13221727 issue

Links from other tables

  • 1 row from issues_id in issues_labels
  • 3 rows from issue in issue_comments
Powered by Datasette · Queries took 1.836ms · About: xarray-datasette