home / github / issues

Menu
  • GraphQL API
  • Search all tables

issues: 1870484988

This data as json

id node_id number title user state locked assignee milestone comments created_at updated_at closed_at author_association active_lock_reason draft pull_request body reactions performed_via_github_app state_reason repo type
1870484988 I_kwDOAMm_X85vfVX8 8120 `open_mfdataset` exits while sending a "Segmentation fault" error 50383939 closed 0     4 2023-08-28T20:51:23Z 2023-09-01T15:43:08Z 2023-09-01T15:43:08Z NONE      

What is your issue?

I try to open about ~10 files, each 5MB as a test case, using xarray's open_mfdataset method with the parallel=True option, however, it throws a "Segmentation fault" error as the following:

```python $ ipython Python 3.10.2 (main, Feb 4 2022, 19:10:35) [GCC 9.3.0] Type 'copyright', 'credits' or 'license' for more information IPython 8.10.0 -- An enhanced Interactive Python. Type '?' for help.

In [1]: import xarray as xr

In [2]: ds = xr.open_mfdataset('./ab_models_198001*.nc', chunks={'time':10})

In [3]: ds Out[3]: <xarray.Dataset> Dimensions: (time: 744, rlat: 140, rlon: 105) Coordinates: * time (time) datetime64[ns] 1980-01-01T13:00:00 ... 1980-0... lon (rlat, rlon) float32 dask.array<chunksize=(140, 105), meta=np.ndarray> lat (rlat, rlon) float32 dask.array<chunksize=(140, 105), meta=np.ndarray> * rlon (rlon) float64 342.1 342.2 342.2 ... 351.2 351.3 351.4 * rlat (rlat) float64 -7.83 -7.74 -7.65 ... 4.5 4.59 4.68 Data variables: rotated_pole (time) int32 1 1 1 1 1 1 1 1 1 1 ... 1 1 1 1 1 1 1 1 1 RDRS_v2.1_P_UVC_10m (time, rlat, rlon) float32 dask.array<chunksize=(10, 140, 105), meta=np.ndarray> RDRS_v2.1_P_FI_SFC (time, rlat, rlon) float32 dask.array<chunksize=(10, 140, 105), meta=np.ndarray> RDRS_v2.1_P_FB_SFC (time, rlat, rlon) float32 dask.array<chunksize=(10, 140, 105), meta=np.ndarray> RDRS_v2.1_A_PR0_SFC (time, rlat, rlon) float32 dask.array<chunksize=(10, 140, 105), meta=np.ndarray> RDRS_v2.1_P_P0_SFC (time, rlat, rlon) float32 dask.array<chunksize=(10, 140, 105), meta=np.ndarray> RDRS_v2.1_P_TT_1.5m (time, rlat, rlon) float32 dask.array<chunksize=(10, 140, 105), meta=np.ndarray> RDRS_v2.1_P_HU_1.5m (time, rlat, rlon) float32 dask.array<chunksize=(10, 140, 105), meta=np.ndarray> Attributes: CDI: Climate Data Interface version 2.0.4 (https://mpimet.mpg.de... Conventions: CF-1.6 product: RDRS_v2.1 Remarks: Variable names are following the convention <Product>_<Type... License: These data are provided by the Canadian Surface Prediction ... history: Mon Aug 28 13:44:02 2023: cdo -z zip -s -L -sellonlatbox,-1... NCO: netCDF Operators version 5.0.6 (Homepage = http://nco.sf.ne... CDO: Climate Data Operators version 2.0.4 (https://mpimet.mpg.de...

In [4]: type(ds) Out[4]: xarray.core.dataset.Dataset

In [5]: ds = xr.open_mfdataset('./ab_models_198001*.nc', chunks={'time':10}, parallel=True) [gra-login3:25527:0:6913] Caught signal 11 (Segmentation fault: address not mapped to object at address 0x8) [gra-login3:25527] *** Process received signal *** [gra-login3:25527] Signal: Segmentation fault (11) [gra-login3:25527] Signal code: (128) [gra-login3:25527] Failing at address: (nil) Segmentation fault

```

Here is the version of xarray:

```python In [5]: xr.show_versions() /home/user/virtual-envs/scienv/lib/python3.10/site-packages/_distutils_hack/init.py:36: UserWarning: Setuptools is replacing distutils. warnings.warn("Setuptools is replacing distutils.")

INSTALLED VERSIONS

commit: None python: 3.10.2 (main, Feb 4 2022, 19:10:35) [GCC 9.3.0] python-bits: 64 OS: Linux OS-release: 3.10.0-1160.88.1.el7.x86_64 machine: x86_64 processor: byteorder: little LC_ALL: None LANG: en_CA.UTF-8 LOCALE: ('en_CA', 'UTF-8') libhdf5: 1.12.1 libnetcdf: 4.9.0

xarray: 2023.7.0 pandas: 1.4.0 numpy: 1.21.2 scipy: 1.8.0 netCDF4: 1.6.4 pydap: None h5netcdf: None h5py: None Nio: None zarr: None cftime: 1.6.2 nc_time_axis: None PseudoNetCDF: None iris: None bottleneck: None dask: 2023.8.0 distributed: 2023.8.0 matplotlib: 3.5.1 cartopy: None seaborn: None numbagg: None fsspec: 2023.6.0 cupy: None pint: None sparse: None flox: None numpy_groupies: None setuptools: 60.2.0 pip: 23.2.1 conda: None pytest: 7.4.0 mypy: None IPython: 8.10.0 sphinx: None ```

I'm working on an HPC, so if a list "modules" I have loaded helps, here it is: ```console $ module list

Currently Loaded Modules: 1) CCconfig 5) gcccore/.9.3.0 (H) 9) libfabric/1.10.1 13) ipykernel/2023a 17) sqlite/3.38.5 21) postgresql/12.4 (t) 25) gdal/3.5.1 (geo) 29) udunits/2.2.28 (t) 33) cdo/2.2.1 (geo) 2) gentoo/2020 (S) 6) imkl/2020.1.217 (math) 10) openmpi/4.0.3 (m) 14) scipy-stack/2023a (math) 18) jasper/2.0.16 (vis) 22) freexl/1.0.5 (t) 26) geos/3.10.2 (geo) 30) libaec/1.0.6 34) mpi4py/3.1.3 (t) 3) StdEnv/2020 (S) 7) gcc/9.3.0 (t) 11) libffi/3.3 15) hdf5/1.10.6 (io) 19) libgeotiff-proj901/1.7.1 23) librttopo-proj9/1.1.0 27) proj/9.0.1 (geo) 31) eccodes/2.25.0 (geo) 35) netcdf-fortran/4.5.2 (io) 4) mii/1.1.2 8) ucx/1.8.0 12) python/3.10.2 (t) 16) netcdf/4.7.4 (io) 20) cfitsio/4.1.0 (vis) 24) libspatialite-proj901/5.0.1 28) expat/2.4.1 (t) 32) yaxt/0.9.0 (t) 36) libspatialindex/1.8.5 (phys)

Where: S: Module is Sticky, requires --force to unload or purge m: MPI implementations / Implémentations MPI math: Mathematical libraries / Bibliothèques mathématiques io: Input/output software / Logiciel d'écriture/lecture t: Tools for development / Outils de développement vis: Visualisation software / Logiciels de visualisation geo: Geography libraries/apps / Logiciels de géographie phys: Physics libraries/apps / Logiciels de physique H: Hidden Module ```

Thanks.

{
    "url": "https://api.github.com/repos/pydata/xarray/issues/8120/reactions",
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  completed 13221727 issue

Links from other tables

  • 1 row from issues_id in issues_labels
  • 0 rows from issue in issue_comments
Powered by Datasette · Queries took 84.493ms · About: xarray-datasette