issues: 970619131
This data as json
id | node_id | number | title | user | state | locked | assignee | milestone | comments | created_at | updated_at | closed_at | author_association | active_lock_reason | draft | pull_request | body | reactions | performed_via_github_app | state_reason | repo | type |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
970619131 | MDU6SXNzdWU5NzA2MTkxMzE= | 5706 | Loading datasets of numpy string arrays leads to error and/or segfault | 8291800 | closed | 0 | 8 | 2021-08-13T18:17:33Z | 2023-05-12T08:06:07Z | 2023-05-12T08:06:06Z | CONTRIBUTOR | What happened: Numpy arrays of strings that are saved with h5py cause errors and segfaults, not always the same result. What you expected to happen: This works fine with but will consistently have a segfault with I'm assuming this is a netcdf backend issue, but thought I'd raise it here since xarray was how I discovered it. Minimal Complete Verifiable Example: ```python import h5py import xarray as xr with h5py.File("test_str_list.h5", "w") as hf: hf["pairs"] = np.array([["20200101", "20200201"], ["20200101", "20200301"]]).astype("S") ds = xr.load_dataset("test_str_list.h5") *** Error in `/home/scott/miniconda3/envs/mapping/bin/python': munmap_chunk(): invalid pointer: 0x0000559c40956070 *** ======= Backtrace: ========= /lib64/libc.so.6(+0x7f7c4)[0x7f4a9a6bb7c4] /home/scott/miniconda3/envs/mapping/lib/python3.8/site-packages/h5py/../../../libhdf5.so.103(H5MM_xfree+0xf)[0x7f4a7a93c3ef] /home/scott/miniconda3/envs/mapping/lib/python3.8/site-packages/h5py/../../../libhdf5.so.103(H5C__untag_entry+0xc6)[0x7f4a7a854836] /home/scott/miniconda3/envs/mapping/lib/python3.8/site-packages/h5py/../../../libhdf5.so.103(H5C__flush_single_entry+0x275)[0x7f4a7a846085] /home/scott/miniconda3/envs/mapping/lib/python3.8/site-packages/h5py/../../../libhdf5.so.103(+0x80de3)[0x7f4a7a846de3] ... (few thousand line backtrace) ``` Anything else we need to know?: Even stranger, it doesn't seem to be deterministic. After the crash, I tried the same load_dataset: ```python In [2]: ds = xr.load_dataset("test_str_list.h5") UnicodeDecodeError Traceback (most recent call last) <ipython-input-2-475169bc9c75> in <module> ----> 1 ds = xr.load_dataset("test_str_list.h5") ~/miniconda3/envs/mapping/lib/python3.8/site-packages/xarray/backends/api.py in load_dataset(filename_or_obj, kwargs) 242 243 with open_dataset(filename_or_obj, kwargs) as ds: --> 244 return ds.load() 245 246 ~/miniconda3/envs/mapping/lib/python3.8/site-packages/xarray/core/dataset.py in load(self, **kwargs) 871 for k, v in self.variables.items(): 872 if k not in lazy_data: --> 873 v.load() 874 875 return self ~/miniconda3/envs/mapping/lib/python3.8/site-packages/xarray/core/variable.py in load(self, kwargs) 449 self._data = as_compatible_data(self._data.compute(kwargs)) 450 elif not is_duck_array(self._data): --> 451 self._data = np.asarray(self._data) 452 return self 453 ~/miniconda3/envs/mapping/lib/python3.8/site-packages/numpy/core/_asarray.py in asarray(a, dtype, order) 81 82 """ ---> 83 return array(a, dtype, copy=False, order=order) 84 85 ~/miniconda3/envs/mapping/lib/python3.8/site-packages/xarray/core/indexing.py in array(self, dtype) 546 547 def array(self, dtype=None): --> 548 self._ensure_cached() 549 return np.asarray(self.array, dtype=dtype) 550 ~/miniconda3/envs/mapping/lib/python3.8/site-packages/xarray/core/indexing.py in _ensure_cached(self) 543 def _ensure_cached(self): 544 if not isinstance(self.array, NumpyIndexingAdapter): --> 545 self.array = NumpyIndexingAdapter(np.asarray(self.array)) 546 547 def array(self, dtype=None): ~/miniconda3/envs/mapping/lib/python3.8/site-packages/numpy/core/_asarray.py in asarray(a, dtype, order) 81 82 """ ---> 83 return array(a, dtype, copy=False, order=order) 84 85 ~/miniconda3/envs/mapping/lib/python3.8/site-packages/xarray/core/indexing.py in array(self, dtype) 516 517 def array(self, dtype=None): --> 518 return np.asarray(self.array, dtype=dtype) 519 520 def getitem(self, key): ~/miniconda3/envs/mapping/lib/python3.8/site-packages/numpy/core/_asarray.py in asarray(a, dtype, order) 81 82 """ ---> 83 return array(a, dtype, copy=False, order=order) 84 85 ~/miniconda3/envs/mapping/lib/python3.8/site-packages/xarray/core/indexing.py in array(self, dtype) 417 def array(self, dtype=None): 418 array = as_indexable(self.array) --> 419 return np.asarray(array[self.key], dtype=None) 420 421 def transpose(self, order): ~/miniconda3/envs/mapping/lib/python3.8/site-packages/xarray/backends/netCDF4_.py in getitem(self, key) 89 90 def getitem(self, key): ---> 91 return indexing.explicit_indexing_adapter( 92 key, self.shape, indexing.IndexingSupport.OUTER, self._getitem 93 ) ~/miniconda3/envs/mapping/lib/python3.8/site-packages/xarray/core/indexing.py in explicit_indexing_adapter(key, shape, indexing_support, raw_indexing_method) 708 """ 709 raw_key, numpy_indices = decompose_indexer(key, shape, indexing_support) --> 710 result = raw_indexing_method(raw_key.tuple) 711 if numpy_indices.tuple: 712 # index the loaded np.ndarray ~/miniconda3/envs/mapping/lib/python3.8/site-packages/xarray/backends/netCDF4_.py in _getitem(self, key) 102 with self.datastore.lock: 103 original_array = self.get_array(needs_lock=False) --> 104 array = getitem(original_array, key) 105 except IndexError: 106 # Catch IndexError in netCDF4 and return a more informative netCDF4/_netCDF4.pyx in netCDF4._netCDF4.Variable.getitem() netCDF4/_netCDF4.pyx in netCDF4._netCDF4.Variable._get() UnicodeDecodeError: 'utf-8' codec can't decode byte 0xd8 in position 0: invalid continuation byte ``` But then immediately after, another segault
Beginning of segfault stack trace, but goes on``` ======= Backtrace: ========= /lib64/libc.so.6(+0x7f7c4)[0x7f1ba11a87c4] /lib64/libc.so.6(+0x818bb)[0x7f1ba11aa8bb] /home/scott/miniconda3/envs/mapping/lib/python3.8/site-packages/h5py/../../../libhdf5.so.103(H5MM_xfree+0xf)[0x7f1b8142d3ef] /home/scott/miniconda3/envs/mapping/lib/python3.8/site-packages/h5py/../../../libhdf5.so.103(H5S_close+0x84)[0x7f1b814a69a4] /home/scott/miniconda3/envs/mapping/lib/python3.8/site-packages/h5py/../../../libhdf5.so.103(H5I_dec_ref+0x77)[0x7f1b8141a407] /home/scott/miniconda3/envs/mapping/lib/python3.8/site-packages/h5py/../../../libhdf5.so.103(H5I_dec_app_ref+0x29)[0x7f1b8141a4d9] /home/scott/miniconda3/envs/mapping/lib/python3.8/site-packages/h5py/../../../libhdf5.so.103(H5Sclose+0x73)[0x7f1b814a7023] /home/scott/miniconda3/envs/mapping/lib/python3.8/site-packages/rasterio/../../.././libnetcdf.so.18(NC4_get_vars+0x5ad)[0x7f1b7bbc46ad] /home/scott/miniconda3/envs/mapping/lib/python3.8/site-packages/rasterio/../../.././libnetcdf.so.18(NC4_get_vara+0x12)[0x7f1b7bbc4e62] /home/scott/miniconda3/envs/mapping/lib/python3.8/site-packages/rasterio/../../.././libnetcdf.so.18(NC_get_vara+0x6f)[0x7f1b7bb6b5df] /home/scott/miniconda3/envs/mapping/lib/python3.8/site-packages/rasterio/../../.././libnetcdf.so.18(nc_get_vara+0x8b)[0x7f1b7bb6c35b] /home/scott/miniconda3/envs/mapping/lib/python3.8/site-packages/netCDF4/_netCDF4.cpython-38-x86_64-linux-gnu.so(+0xccf21)[0x7f1b4d0daf21] /home/scott/miniconda3/envs/mapping/bin/python(+0x13a77e)[0x55f97aeca77e] /home/scott/miniconda3/envs/mapping/lib/python3.8/site-packages/netCDF4/_netCDF4.cpython-38-x86_64-linux-gnu.so(+0x224fd)[0x7f1b4d0304fd] /home/scott/miniconda3/envs/mapping/lib/python3.8/site-packages/netCDF4/_netCDF4.cpython-38-x86_64-linux-gnu.so(+0x559d9)[0x7f1b4d0639d9] /home/scott/miniconda3/envs/mapping/bin/python(PyObject_GetItem+0x48)[0x55f97af10aa8] /home/scott/miniconda3/envs/mapping/bin/python(+0x139acd)[0x55f97aec9acd] ```Environment: Output of <tt>xr.show_versions()</tt>In [1]: xr.show_versions() INSTALLED VERSIONS ------------------ commit: None python: 3.8.5 | packaged by conda-forge | (default, Aug 29 2020, 01:22:49) [GCC 7.5.0] python-bits: 64 OS: Linux OS-release: 3.10.0-1062.4.1.el7.x86_64 machine: x86_64 processor: x86_64 byteorder: little LC_ALL: None LANG: en_US.UTF-8 LOCALE: ('en_US', 'UTF-8') libhdf5: 1.10.6 libnetcdf: 4.7.4 xarray: 0.19.0 pandas: 1.1.0 numpy: 1.19.2 scipy: 1.5.3 netCDF4: 1.5.4 pydap: None h5netcdf: 0.11.0 h5py: 3.2.1 Nio: None zarr: 2.8.3 cftime: 1.2.1 nc_time_axis: None PseudoNetCDF: None rasterio: 1.1.5 cfgrib: 0.9.8.5 iris: None bottleneck: 1.3.2 dask: 2021.01.0 distributed: 2.20.0 matplotlib: 3.3.1 cartopy: 0.17.0 seaborn: None numbagg: None pint: 0.17 setuptools: 50.3.2 pip: 21.1.3 conda: 4.8.4 pytest: None IPython: 7.18.1 sphinx: 4.0.2 |
{ "url": "https://api.github.com/repos/pydata/xarray/issues/5706/reactions", "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
completed | 13221727 | issue |