html_url,issue_url,id,node_id,user,created_at,updated_at,author_association,body,reactions,performed_via_github_app,issue https://github.com/pydata/xarray/issues/2954#issuecomment-509134252,https://api.github.com/repos/pydata/xarray/issues/2954,509134252,MDEyOklzc3VlQ29tbWVudDUwOTEzNDI1Mg==,500246,2019-07-08T08:39:01Z,2019-07-08T08:39:01Z,CONTRIBUTOR,And I can confirm that the problem I reported originally on May 10 is also gone with #3082.,"{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,442617907 https://github.com/pydata/xarray/issues/2954#issuecomment-509132581,https://api.github.com/repos/pydata/xarray/issues/2954,509132581,MDEyOklzc3VlQ29tbWVudDUwOTEzMjU4MQ==,500246,2019-07-08T08:34:11Z,2019-07-08T08:34:38Z,CONTRIBUTOR,"@shoyer I checked out your branch and the latter test example runs successfully - no segmentation fault and no files left open. I will test the former test example now.","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,442617907 https://github.com/pydata/xarray/issues/2954#issuecomment-508900470,https://api.github.com/repos/pydata/xarray/issues/2954,508900470,MDEyOklzc3VlQ29tbWVudDUwODkwMDQ3MA==,500246,2019-07-06T06:09:04Z,2019-07-06T06:09:04Z,CONTRIBUTOR,"There are some files triggering the problem at ftp://ftp.eumetsat.int/pub/OPS/out/test-data/Test-data-for-External-Users/MTG_FCI_Test-Data/FCI_L1C_24hr_Test_Data_for_Users/1.0/UNCOMPRESSED/ I will test the PR later (latest on Monday)","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,442617907 https://github.com/pydata/xarray/issues/2954#issuecomment-508873420,https://api.github.com/repos/pydata/xarray/issues/2954,508873420,MDEyOklzc3VlQ29tbWVudDUwODg3MzQyMA==,1217238,2019-07-05T22:29:01Z,2019-07-05T22:29:01Z,MEMBER,"OK, I have a tentative fix up in https://github.com/pydata/xarray/pull/3082. @gerritholl I have not been able to directly reproduce this issue, so it would be great if you could test my pull request before we merge it to verify whether or not the fix works.","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,442617907 https://github.com/pydata/xarray/issues/2954#issuecomment-508857913,https://api.github.com/repos/pydata/xarray/issues/2954,508857913,MDEyOklzc3VlQ29tbWVudDUwODg1NzkxMw==,1217238,2019-07-05T20:39:56Z,2019-07-05T20:39:56Z,MEMBER,"Thinking about this a little more, I suspect the issue might be related to how xarray opens a file multiple times to read different groups. It is very likely that libraries like netCDF-C don't handle this properly. Instead, we should probably open files once, and reuse them for reading from different groups.","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,442617907 https://github.com/pydata/xarray/issues/2954#issuecomment-508853908,https://api.github.com/repos/pydata/xarray/issues/2954,508853908,MDEyOklzc3VlQ29tbWVudDUwODg1MzkwOA==,1217238,2019-07-05T20:17:22Z,2019-07-05T20:17:22Z,MEMBER,"> But there's something with the specific netcdf file going on, for when I create artificial groups, it does not segfault. Can you share a netCDF file that causes this issue?","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,442617907 https://github.com/pydata/xarray/issues/2954#issuecomment-508772044,https://api.github.com/repos/pydata/xarray/issues/2954,508772044,MDEyOklzc3VlQ29tbWVudDUwODc3MjA0NA==,500246,2019-07-05T14:13:20Z,2019-07-05T14:14:20Z,CONTRIBUTOR,"This triggers a segmentation fault (in the `.persist()` call) on my system, which may be related: ```python import xarray import os import subprocess xarray.set_options(file_cache_maxsize=1) f = ""/path/to/netcdf/file.nc"" ds1 = xarray.open_dataset(f, ""/group1"", chunks=1024) ds2 = xarray.open_dataset(f, ""/group2"", chunks=1024) ds_cat = xarray.concat([ds1, ds2]) ds_cat.persist() subprocess.run(fr""lsof | grep {os.getpid():d} | grep '\.nc$'"", shell=True) ``` But there's something with the specific netcdf file going on, for when I create artificial groups, it does not segfault. ``` Fatal Python error: Segmentation fault Thread 0x00007f542bfff700 (most recent call first): File ""/media/nas/x21324/miniconda3/envs/py37d/lib/python3.7/multiprocessing/pool.py"", line 470 in _handle_results File ""/media/nas/x21324/miniconda3/envs/py37d/lib/python3.7/threading.py"", line 865 in run File ""/media/nas/x21324/miniconda3/envs/py37d/lib/python3.7/threading.py"", line 917 in _bootstrap_inner File ""/media/nas/x21324/miniconda3/envs/py37d/lib/python3.7/threading.py"", line 885 in _bootstrap Thread 0x00007f5448ff9700 (most recent call first): File ""/media/nas/x21324/miniconda3/envs/py37d/lib/python3.7/multiprocessing/pool.py"", line 422 in _handle_tasks File ""/media/nas/x21324/miniconda3/envs/py37d/lib/python3.7/threading.py"", line 865 in run File ""/media/nas/x21324/miniconda3/envs/py37d/lib/python3.7/threading.py"", line 917 in _bootstrap_inner File ""/media/nas/x21324/miniconda3/envs/py37d/lib/python3.7/threading.py"", line 885 in _bootstrap Thread 0x00007f54497fa700 (most recent call first): File ""/media/nas/x21324/miniconda3/envs/py37d/lib/python3.7/multiprocessing/pool.py"", line 413 in _handle_workers File ""/media/nas/x21324/miniconda3/envs/py37d/lib/python3.7/threading.py"", line 865 in run File ""/media/nas/x21324/miniconda3/envs/py37d/lib/python3.7/threading.py"", line 917 in _bootstrap_inner File ""/media/nas/x21324/miniconda3/envs/py37d/lib/python3.7/threading.py"", line 885 in _bootstrap Thread 0x00007f5449ffb700 (most recent call first): File ""/media/nas/x21324/miniconda3/envs/py37d/lib/python3.7/multiprocessing/pool.py"", line 110 in worker File ""/media/nas/x21324/miniconda3/envs/py37d/lib/python3.7/threading.py"", line 865 in run File ""/media/nas/x21324/miniconda3/envs/py37d/lib/python3.7/threading.py"", line 917 in _bootstrap_inner File ""/media/nas/x21324/miniconda3/envs/py37d/lib/python3.7/threading.py"", line 885 in _bootstrap Thread 0x00007f544a7fc700 (most recent call first): File ""/media/nas/x21324/miniconda3/envs/py37d/lib/python3.7/multiprocessing/pool.py"", line 110 in worker File ""/media/nas/x21324/miniconda3/envs/py37d/lib/python3.7/threading.py"", line 865 in run File ""/media/nas/x21324/miniconda3/envs/py37d/lib/python3.7/threading.py"", line 917 in _bootstrap_inner File ""/media/nas/x21324/miniconda3/envs/py37d/lib/python3.7/threading.py"", line 885 in _bootstrap Thread 0x00007f544affd700 (most recent call first): File ""/media/nas/x21324/miniconda3/envs/py37d/lib/python3.7/multiprocessing/pool.py"", line 110 in worker File ""/media/nas/x21324/miniconda3/envs/py37d/lib/python3.7/threading.py"", line 865 in run File ""/media/nas/x21324/miniconda3/envs/py37d/lib/python3.7/threading.py"", line 917 in _bootstrap_inner File ""/media/nas/x21324/miniconda3/envs/py37d/lib/python3.7/threading.py"", line 885 in _bootstrap Thread 0x00007f544b7fe700 (most recent call first): File ""/media/nas/x21324/miniconda3/envs/py37d/lib/python3.7/multiprocessing/pool.py"", line 110 in worker File ""/media/nas/x21324/miniconda3/envs/py37d/lib/python3.7/threading.py"", line 865 in run File ""/media/nas/x21324/miniconda3/envs/py37d/lib/python3.7/threading.py"", line 917 in _bootstrap_inner File ""/media/nas/x21324/miniconda3/envs/py37d/lib/python3.7/threading.py"", line 885 in _bootstrap Thread 0x00007f544bfff700 (most recent call first): File ""/media/nas/x21324/miniconda3/envs/py37d/lib/python3.7/multiprocessing/pool.py"", line 110 in worker File ""/media/nas/x21324/miniconda3/envs/py37d/lib/python3.7/threading.py"", line 865 in run File ""/media/nas/x21324/miniconda3/envs/py37d/lib/python3.7/threading.py"", line 917 in _bootstrap_inner File ""/media/nas/x21324/miniconda3/envs/py37d/lib/python3.7/threading.py"", line 885 in _bootstrap Thread 0x00007f5458a75700 (most recent call first): File ""/media/nas/x21324/miniconda3/envs/py37d/lib/python3.7/multiprocessing/pool.py"", line 110 in worker File ""/media/nas/x21324/miniconda3/envs/py37d/lib/python3.7/threading.py"", line 865 in run File ""/media/nas/x21324/miniconda3/envs/py37d/lib/python3.7/threading.py"", line 917 in _bootstrap_inner File ""/media/nas/x21324/miniconda3/envs/py37d/lib/python3.7/threading.py"", line 885 in _bootstrap Thread 0x00007f5459276700 (most recent call first): File ""/media/nas/x21324/miniconda3/envs/py37d/lib/python3.7/multiprocessing/pool.py"", line 110 in worker File ""/media/nas/x21324/miniconda3/envs/py37d/lib/python3.7/threading.py"", line 865 in run File ""/media/nas/x21324/miniconda3/envs/py37d/lib/python3.7/threading.py"", line 917 in _bootstrap_inner File ""/media/nas/x21324/miniconda3/envs/py37d/lib/python3.7/threading.py"", line 885 in _bootstrap Thread 0x00007f5459a77700 (most recent call first): File ""/media/nas/x21324/miniconda3/envs/py37d/lib/python3.7/multiprocessing/pool.py"", line 110 in worker File ""/media/nas/x21324/miniconda3/envs/py37d/lib/python3.7/threading.py"", line 865 in run File ""/media/nas/x21324/miniconda3/envs/py37d/lib/python3.7/threading.py"", line 917 in _bootstrap_inner File ""/media/nas/x21324/miniconda3/envs/py37d/lib/python3.7/threading.py"", line 885 in _bootstrap Current thread 0x00007f54731236c0 (most recent call first): File ""/media/nas/x21324/miniconda3/envs/py37d/lib/python3.7/site-packages/xarray/backends/netCDF4_.py"", line 244 in _open_netcdf4_group File ""/media/nas/x21324/miniconda3/envs/py37d/lib/python3.7/site-packages/xarray/backends/file_manager.py"", line 173 in acquire File ""/media/nas/x21324/miniconda3/envs/py37d/lib/python3.7/site-packages/xarray/backends/netCDF4_.py"", line 56 in get_array File ""/media/nas/x21324/miniconda3/envs/py37d/lib/python3.7/site-packages/xarray/backends/netCDF4_.py"", line 74 in _getitem File ""/media/nas/x21324/miniconda3/envs/py37d/lib/python3.7/site-packages/xarray/core/indexing.py"", line 778 in explicit_indexing_adapter File ""/media/nas/x21324/miniconda3/envs/py37d/lib/python3.7/site-packages/xarray/backends/netCDF4_.py"", line 64 in __getitem__ File ""/media/nas/x21324/miniconda3/envs/py37d/lib/python3.7/site-packages/xarray/core/indexing.py"", line 510 in __array__ File ""/media/nas/x21324/miniconda3/envs/py37d/lib/python3.7/site-packages/numpy/core/numeric.py"", line 538 in asarray File ""/media/nas/x21324/miniconda3/envs/py37d/lib/python3.7/site-packages/xarray/core/indexing.py"", line 604 in __array__ File ""/media/nas/x21324/miniconda3/envs/py37d/lib/python3.7/site-packages/numpy/core/numeric.py"", line 538 in asarray File ""/media/nas/x21324/miniconda3/envs/py37d/lib/python3.7/site-packages/xarray/core/variable.py"", line 213 in _as_array_or_item File ""/media/nas/x21324/miniconda3/envs/py37d/lib/python3.7/site-packages/xarray/core/variable.py"", line 392 in values File ""/media/nas/x21324/miniconda3/envs/py37d/lib/python3.7/site-packages/xarray/core/variable.py"", line 297 in data File ""/media/nas/x21324/miniconda3/envs/py37d/lib/python3.7/site-packages/xarray/core/variable.py"", line 1204 in set_dims File ""/media/nas/x21324/miniconda3/envs/py37d/lib/python3.7/site-packages/xarray/core/combine.py"", line 298 in ensure_common_dims File ""/media/nas/x21324/miniconda3/envs/py37d/lib/python3.7/site-packages/xarray/core/variable.py"", line 2085 in concat File ""/media/nas/x21324/miniconda3/envs/py37d/lib/python3.7/site-packages/xarray/core/combine.py"", line 305 in _dataset_concat File ""/media/nas/x21324/miniconda3/envs/py37d/lib/python3.7/site-packages/xarray/core/combine.py"", line 120 in concat File ""mwe13.py"", line 19 in Segmentation fault (core dumped) ```","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,442617907 https://github.com/pydata/xarray/issues/2954#issuecomment-508728959,https://api.github.com/repos/pydata/xarray/issues/2954,508728959,MDEyOklzc3VlQ29tbWVudDUwODcyODk1OQ==,500246,2019-07-05T11:29:50Z,2019-07-05T11:29:50Z,CONTRIBUTOR,"This can also be triggered by a `.persist(...)` call, although I don't yet understand the precise circumstances.","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,442617907 https://github.com/pydata/xarray/issues/2954#issuecomment-492837199,https://api.github.com/repos/pydata/xarray/issues/2954,492837199,MDEyOklzc3VlQ29tbWVudDQ5MjgzNzE5OQ==,1828519,2019-05-15T21:51:26Z,2019-05-15T21:51:39Z,CONTRIBUTOR,"> Would it be better if we raised an error in these cases, when you later try to access data from a file that was explicitly closed? I would prefer if it stayed the way it is. I can use the context manager to access specific variables but still hold on to the DataArray objects with dask arrays underneath and use them later. In the non-dask case, I'm not sure.","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,442617907 https://github.com/pydata/xarray/issues/2954#issuecomment-492732052,https://api.github.com/repos/pydata/xarray/issues/2954,492732052,MDEyOklzc3VlQ29tbWVudDQ5MjczMjA1Mg==,1217238,2019-05-15T16:43:17Z,2019-05-15T16:43:17Z,MEMBER,"> is not closing the file after it has been opened for retrieving a ""lazy"" file by design, or might this be considered a wart/bug? You can achieve this behavior (nearly) by setting `xarray.set_options(file_cache_maxsize=1)`. Note that the default for `file_cache_maxsize` is 128, which is suspiciously similar to the number of files/groups at which you encounter issues. In theory we use appropriate locks for automatically closing files when the cache size is exceeded, but this may not be working properly. If you can make a test case with synthetic data (e.g., including a script to make files) I can see if I can reproduce/fix this. But to clarify the intent here: we don't close files around every access to data because can cause a severe loss in performance, e.g., if you're using dask to read a bunch of chunks out of the same file. I agree that it's unintuitive how we ignore the explicit context manager. Would it be better if we raised an error in these cases, when you later try to access data from a file that was explicitly closed? It's not immediately obvious to me how to refactor the code to achieve this, but this does seem like it would make for a better user experience.","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,442617907 https://github.com/pydata/xarray/issues/2954#issuecomment-492509798,https://api.github.com/repos/pydata/xarray/issues/2954,492509798,MDEyOklzc3VlQ29tbWVudDQ5MjUwOTc5OA==,1217238,2019-05-15T05:32:16Z,2019-05-15T05:32:16Z,MEMBER,"Nevermind, I think we do properly use the right locks. But perhaps there is an issue with re-using open files when using netCDF4/HDF5 groups. Does this same issue appear if you use `engine='h5netcdf'`? That would be an interesting data point.","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,442617907 https://github.com/pydata/xarray/issues/2954#issuecomment-492507869,https://api.github.com/repos/pydata/xarray/issues/2954,492507869,MDEyOklzc3VlQ29tbWVudDQ5MjUwNzg2OQ==,1217238,2019-05-15T05:22:24Z,2019-05-15T05:22:24Z,MEMBER,"Looking through the code for `open_dataset()` it appears that we have a bug: by default we don't file locks! (We do use these by default for `open_mfdataset()`.) This should really be fixed, I will try to make a pull request shortly.","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,442617907 https://github.com/pydata/xarray/issues/2954#issuecomment-491866549,https://api.github.com/repos/pydata/xarray/issues/2954,491866549,MDEyOklzc3VlQ29tbWVudDQ5MTg2NjU0OQ==,500246,2019-05-13T15:18:33Z,2019-05-13T15:18:33Z,CONTRIBUTOR,"In our code, this problem gets triggered because of xarrays lazy handling. If we have ``` with xr.open_dataset('file.nc') as ds: val = ds[""field""] return val ``` then when a caller tries to use `val`, xarray reopens the dataset and does not close it again. This means the context manager is actually useless: we're using the context manager to close the file as soon as we have accessed the value, but later the file gets opened again anyway. This is against the intention of the code. We can avoid this by calling `val.load()` from within the context manager, as the linked satpy PR above does. But what is the intention of xarrays design here? Should lazy reading close the file after opening and reading the value? I would say it probably should do something like ``` if file_was_not_open: open file get value close file # this step currently omitted return value else: get value return value ``` is not closing the file after it has been opened for retrieving a ""lazy"" file by design, or might this be considered a wart/bug?","{""total_count"": 2, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 2}",,442617907 https://github.com/pydata/xarray/issues/2954#issuecomment-491221266,https://api.github.com/repos/pydata/xarray/issues/2954,491221266,MDEyOklzc3VlQ29tbWVudDQ5MTIyMTI2Ng==,500246,2019-05-10T09:18:28Z,2019-05-10T09:18:28Z,CONTRIBUTOR,"Note that if I close every file neatly, there is no segmentation fault.","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,442617907