id,node_id,number,title,user,state,locked,assignee,milestone,comments,created_at,updated_at,closed_at,author_association,active_lock_reason,draft,pull_request,body,reactions,performed_via_github_app,state_reason,repo,type 2037869483,I_kwDOAMm_X855d2ur,8544,Reading netcdf file with string coordinates makes IPython kernel crash (netcdf4 engine),36678697,closed,0,,,14,2023-12-12T14:26:42Z,2024-01-04T21:49:34Z,2023-12-22T09:29:45Z,NONE,,,,"### What happened? When trying to open a netcdf file that has strings as coordinates it makes the notebook kernel crash. This only happens when `engine=netcdf4`, and not when `engine=h5netcdf`. The bug occurs in IPython, in Jupyter in the web browser and in VSCode notebooks at least. The bug can consistently be reproduced when reading the same file twice on the same cell, when running the cell twice. ### What did you expect to happen? It is expected for `engine=netcdf4` to work the same as `engine=h5netcdf`, i.e. don't make the kernel crash. ### Minimal Complete Verifiable Example ```Python # %% import numpy as np import xarray as xr # %% fpath = ""test.nc"" da = xr.DataArray( data=np.random.randn(3, 10), dims=[""label"", ""values""], coords=dict( label=[""a"", ""b"", ""c""], ), ) da.to_netcdf(fpath) # %% # engine = ""h5netcdf"" engine = ""netcdf4"" xr.open_dataarray(fpath, engine=engine) xr.open_dataarray(fpath, engine=engine) ``` ### MVCE confirmation - [X] Minimal example — the example is as focused as reasonably possible to demonstrate the underlying issue in xarray. - [X] Complete example — the example is self-contained, including all data and the text of any traceback. - [X] Verifiable example — the example copy & pastes into an IPython prompt or [Binder notebook](https://mybinder.org/v2/gh/pydata/xarray/main?urlpath=lab/tree/doc/examples/blank_template.ipynb), returning the result. - [X] New issue — a search of GitHub Issues suggests this is not a duplicate. - [X] Recent environment — the issue occurs with the latest version of xarray and its dependencies. ### Relevant log output IPython crashes with: `Segmentation fault (core dumped)` Jupyter Notebook logs: ``` [I 2023-12-12 15:20:00.474 ServerApp] Kernel restarted: 054a63c4-4f46-4dc2-b58f-4dcd4ce9951c [I 2023-12-12 15:20:00.482 ServerApp] Starting buffering for 054a63c4-4f46-4dc2-b58f-4dcd4ce9951c:0bd5dcd6-faa7-413a-b6c5-080b1c774933 [I 2023-12-12 15:20:00.494 ServerApp] Connecting to kernel 054a63c4-4f46-4dc2-b58f-4dcd4ce9951c. [I 2023-12-12 15:20:00.494 ServerApp] Restoring connection for 054a63c4-4f46-4dc2-b58f-4dcd4ce9951c:0bd5dcd6-faa7-413a-b6c5-080b1c774933 0.00s - Debugger warning: It seems that frozen modules are being used, which may 0.00s - make the debugger miss breakpoints. Please pass -Xfrozen_modules=off 0.00s - to python to disable frozen modules. 0.00s - Note: Debugging will proceed. Set PYDEVD_DISABLE_FILE_VALIDATION=1 to disable this validation. [IPKernelApp] WARNING | Unknown error in handling startup files: [I 2023-12-12 15:20:09.463 ServerApp] AsyncIOLoopKernelRestarter: restarting kernel (1/5), keep random ports [W 2023-12-12 15:20:09.463 ServerApp] kernel 054a63c4-4f46-4dc2-b58f-4dcd4ce9951c restarted [I 2023-12-12 15:20:09.470 ServerApp] Starting buffering for 054a63c4-4f46-4dc2-b58f-4dcd4ce9951c:0bd5dcd6-faa7-413a-b6c5-080b1c774933 [I 2023-12-12 15:20:09.504 ServerApp] Connecting to kernel 054a63c4-4f46-4dc2-b58f-4dcd4ce9951c. [I 2023-12-12 15:20:09.505 ServerApp] Restoring connection for 054a63c4-4f46-4dc2-b58f-4dcd4ce9951c:0bd5dcd6-faa7-413a-b6c5-080b1c774933 0.00s - Debugger warning: It seems that frozen modules are being used, which may 0.00s - make the debugger miss breakpoints. Please pass -Xfrozen_modules=off 0.00s - to python to disable frozen modules. 0.00s - Note: Debugging will proceed. Set PYDEVD_DISABLE_FILE_VALIDATION=1 to disable this validation. [IPKernelApp] WARNING | Unknown error in handling startup files: ``` VSCode notebook Jupyter logs: ``` 15:23:03.501 [info] Restart requested ~/Desktop/bug_xarray_notebook/bug.ipynb 15:23:03.502 [info] Dispose Kernel process 2763594. 15:23:03.589 [info] Process Execution: ~/miniconda3/bin/python -c ""import ipykernel; print(ipykernel.__version__); print(""5dc3a68c-e34e-4080-9c3e-2a532b2ccb4d""); print(ipykernel.__file__)"" 15:23:03.671 [info] Process Execution: ~/miniconda3/bin/python -m ipykernel_launcher --f=~/.local/share/jupyter/runtime/kernel-v2-2727807dzOm3m1LEA5V.json > cwd: ~/Desktop/bug_xarray_notebook 15:23:04.149 [warn] StdErr from Kernel Process [IPKernelApp] WARNING | Unknown error in handling startup files: 15:23:04.454 [info] Restarted bd04fd87-98e7-486d-a6c6-7308101edcdf 15:23:08.046 [info] Handle Execution of Cells 0 for ~/Desktop/bug_xarray_notebook/bug.ipynb 15:23:08.055 [info] Kernel acknowledged execution of cell 0 @ 1702390988054 15:23:08.412 [info] End cell 0 execution after 0.358s, completed @ 1702390988412, started @ 1702390988054 15:23:09.260 [info] Handle Execution of Cells 1 for ~/Desktop/bug_xarray_notebook/bug.ipynb 15:23:09.269 [info] Kernel acknowledged execution of cell 1 @ 1702390989268 15:23:09.305 [info] End cell 1 execution after 0.036s, completed @ 1702390989304, started @ 1702390989268 15:23:10.893 [info] Handle Execution of Cells 2 for ~/Desktop/bug_xarray_notebook/bug.ipynb 15:23:10.907 [info] Kernel acknowledged execution of cell 2 @ 1702390990907 15:23:10.971 [info] End cell 2 execution after 0.064s, completed @ 1702390990971, started @ 1702390990907 15:23:12.255 [info] Handle Execution of Cells 2 for ~/Desktop/bug_xarray_notebook/bug.ipynb 15:23:12.262 [info] Kernel acknowledged execution of cell 2 @ 1702390992262 15:23:12.504 [error] Disposing session as kernel process died ExitCode: undefined, Reason: [IPKernelApp] WARNING | Unknown error in handling startup files: 15:23:12.505 [info] Dispose Kernel process 2764104. 15:23:12.518 [info] End cell 2 execution after -1702390992.262s, completed @ undefined, started @ 1702390992262 ``` ### Anything else we need to know? _No response_ ### Environment
INSTALLED VERSIONS ------------------ commit: None python: 3.12.0 | packaged by conda-forge | (main, Oct 3 2023, 08:43:22) [GCC 12.3.0] python-bits: 64 OS: Linux OS-release: 5.15.0-91-generic machine: x86_64 processor: x86_64 byteorder: little LC_ALL: en_US.UTF-8 LANG: en_US.UTF-8 LOCALE: ('en_US', 'UTF-8') libhdf5: 1.14.3 libnetcdf: 4.9.2 xarray: 2023.12.0 pandas: 2.1.4 numpy: 1.26.2 scipy: None netCDF4: 1.6.5 pydap: None h5netcdf: 1.3.0 h5py: 3.10.0 Nio: None zarr: None cftime: 1.6.3 nc_time_axis: None iris: None bottleneck: None dask: None distributed: None matplotlib: None cartopy: None seaborn: None numbagg: None fsspec: None cupy: None pint: None sparse: None flox: None numpy_groupies: None setuptools: 68.2.2 pip: 23.3.1 conda: None pytest: None mypy: None IPython: 8.18.1 sphinx: None
","{""url"": ""https://api.github.com/repos/pydata/xarray/issues/8544/reactions"", ""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,completed,13221727,issue 627735640,MDU6SXNzdWU2Mjc3MzU2NDA=,4113,xarray.DataArray.stack load data into memory,36678697,closed,0,,,6,2020-05-30T13:45:38Z,2022-04-19T16:10:26Z,2022-04-19T16:10:25Z,NONE,,,,"Stacking is loading the data into memory, which is unexpected, or at least undocumented, afaik. #### MCVE Code Sample ```python import os import psutil import numpy as np import xarray as xr def main(): xr.DataArray( np.random.randn(1024, 1024, 100), dims=(""x"", ""y"", ""z""), ).to_netcdf(""da.nc"") da = xr.open_dataarray(""da.nc"") print(f"" da: {mb(da.nbytes)} MB"") print_ram_state() mda = da.stack(px=(""x"", ""y"")) print_ram_state() def print_ram_state(): # https://stackoverflow.com/a/21632554 process = psutil.Process(os.getpid()) ram_state = process.memory_info().rss print(f""RAM: {mb(ram_state) :.2f} MB"") def mb(nbytes): return nbytes / (1024 * 1024) if __name__ == ""__main__"": main() ``` #### Problem Description Using [`xarray.DataArray.stack`](http://xarray.pydata.org/en/stable/generated/xarray.DataArray.stack.html) method is loading the data into memory, which is unexpected behavior, or at least undocumented afaik. #### Versions
Output of xr.show_versions() INSTALLED VERSIONS ------------------ commit: None python: 3.7.6 | packaged by conda-forge | (default, Mar 23 2020, 23:03:20) [GCC 7.3.0] python-bits: 64 OS: Linux OS-release: 5.3.0-53-generic machine: x86_64 processor: x86_64 byteorder: little LC_ALL: None LANG: en_US.UTF-8 LOCALE: en_US.UTF-8 libhdf5: 1.10.5 libnetcdf: None xarray: 0.15.1 pandas: 1.0.3 numpy: 1.17.5 scipy: 1.4.1 netCDF4: None pydap: None h5netcdf: None h5py: 2.10.0 Nio: None zarr: None cftime: None nc_time_axis: None PseudoNetCDF: None rasterio: None cfgrib: None iris: None bottleneck: 1.3.2 dask: 2.16.0 distributed: 2.16.0 matplotlib: 3.2.1 cartopy: None seaborn: 0.10.1 numbagg: None setuptools: 46.4.0.post20200518 pip: 20.1.1 conda: 4.8.3 pytest: 5.4.2 IPython: 7.14.0 sphinx: 3.0.4
","{""url"": ""https://api.github.com/repos/pydata/xarray/issues/4113/reactions"", ""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,completed,13221727,issue