home / github

Menu
  • GraphQL API
  • Search all tables

issue_comments

Table actions
  • GraphQL API for issue_comments

14 rows where issue = 442617907 sorted by updated_at descending

✎ View and edit SQL

This data as json, CSV (advanced)

Suggested facets: reactions, created_at (date), updated_at (date)

user 3

  • gerritholl 7
  • shoyer 6
  • djhoese 1

author_association 2

  • CONTRIBUTOR 8
  • MEMBER 6

issue 1

  • Segmentation fault reading many groups from many files · 14 ✖
id html_url issue_url node_id user created_at updated_at ▲ author_association body reactions performed_via_github_app issue
509134252 https://github.com/pydata/xarray/issues/2954#issuecomment-509134252 https://api.github.com/repos/pydata/xarray/issues/2954 MDEyOklzc3VlQ29tbWVudDUwOTEzNDI1Mg== gerritholl 500246 2019-07-08T08:39:01Z 2019-07-08T08:39:01Z CONTRIBUTOR

And I can confirm that the problem I reported originally on May 10 is also gone with #3082.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Segmentation fault reading many groups from many files 442617907
509132581 https://github.com/pydata/xarray/issues/2954#issuecomment-509132581 https://api.github.com/repos/pydata/xarray/issues/2954 MDEyOklzc3VlQ29tbWVudDUwOTEzMjU4MQ== gerritholl 500246 2019-07-08T08:34:11Z 2019-07-08T08:34:38Z CONTRIBUTOR

@shoyer I checked out your branch and the latter test example runs successfully - no segmentation fault and no files left open.

I will test the former test example now.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Segmentation fault reading many groups from many files 442617907
508900470 https://github.com/pydata/xarray/issues/2954#issuecomment-508900470 https://api.github.com/repos/pydata/xarray/issues/2954 MDEyOklzc3VlQ29tbWVudDUwODkwMDQ3MA== gerritholl 500246 2019-07-06T06:09:04Z 2019-07-06T06:09:04Z CONTRIBUTOR

There are some files triggering the problem at ftp://ftp.eumetsat.int/pub/OPS/out/test-data/Test-data-for-External-Users/MTG_FCI_Test-Data/FCI_L1C_24hr_Test_Data_for_Users/1.0/UNCOMPRESSED/ I will test the PR later (latest on Monday)

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Segmentation fault reading many groups from many files 442617907
508873420 https://github.com/pydata/xarray/issues/2954#issuecomment-508873420 https://api.github.com/repos/pydata/xarray/issues/2954 MDEyOklzc3VlQ29tbWVudDUwODg3MzQyMA== shoyer 1217238 2019-07-05T22:29:01Z 2019-07-05T22:29:01Z MEMBER

OK, I have a tentative fix up in https://github.com/pydata/xarray/pull/3082.

@gerritholl I have not been able to directly reproduce this issue, so it would be great if you could test my pull request before we merge it to verify whether or not the fix works.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Segmentation fault reading many groups from many files 442617907
508857913 https://github.com/pydata/xarray/issues/2954#issuecomment-508857913 https://api.github.com/repos/pydata/xarray/issues/2954 MDEyOklzc3VlQ29tbWVudDUwODg1NzkxMw== shoyer 1217238 2019-07-05T20:39:56Z 2019-07-05T20:39:56Z MEMBER

Thinking about this a little more, I suspect the issue might be related to how xarray opens a file multiple times to read different groups. It is very likely that libraries like netCDF-C don't handle this properly. Instead, we should probably open files once, and reuse them for reading from different groups.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Segmentation fault reading many groups from many files 442617907
508853908 https://github.com/pydata/xarray/issues/2954#issuecomment-508853908 https://api.github.com/repos/pydata/xarray/issues/2954 MDEyOklzc3VlQ29tbWVudDUwODg1MzkwOA== shoyer 1217238 2019-07-05T20:17:22Z 2019-07-05T20:17:22Z MEMBER

But there's something with the specific netcdf file going on, for when I create artificial groups, it does not segfault.

Can you share a netCDF file that causes this issue?

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Segmentation fault reading many groups from many files 442617907
508772044 https://github.com/pydata/xarray/issues/2954#issuecomment-508772044 https://api.github.com/repos/pydata/xarray/issues/2954 MDEyOklzc3VlQ29tbWVudDUwODc3MjA0NA== gerritholl 500246 2019-07-05T14:13:20Z 2019-07-05T14:14:20Z CONTRIBUTOR

This triggers a segmentation fault (in the .persist() call) on my system, which may be related:

python import xarray import os import subprocess xarray.set_options(file_cache_maxsize=1) f = "/path/to/netcdf/file.nc" ds1 = xarray.open_dataset(f, "/group1", chunks=1024) ds2 = xarray.open_dataset(f, "/group2", chunks=1024) ds_cat = xarray.concat([ds1, ds2]) ds_cat.persist() subprocess.run(fr"lsof | grep {os.getpid():d} | grep '\.nc$'", shell=True)

But there's something with the specific netcdf file going on, for when I create artificial groups, it does not segfault.

``` Fatal Python error: Segmentation fault

Thread 0x00007f542bfff700 (most recent call first): File "/media/nas/x21324/miniconda3/envs/py37d/lib/python3.7/multiprocessing/pool.py", line 470 in _handle_results File "/media/nas/x21324/miniconda3/envs/py37d/lib/python3.7/threading.py", line 865 in run File "/media/nas/x21324/miniconda3/envs/py37d/lib/python3.7/threading.py", line 917 in _bootstrap_inner File "/media/nas/x21324/miniconda3/envs/py37d/lib/python3.7/threading.py", line 885 in _bootstrap

Thread 0x00007f5448ff9700 (most recent call first): File "/media/nas/x21324/miniconda3/envs/py37d/lib/python3.7/multiprocessing/pool.py", line 422 in _handle_tasks File "/media/nas/x21324/miniconda3/envs/py37d/lib/python3.7/threading.py", line 865 in run File "/media/nas/x21324/miniconda3/envs/py37d/lib/python3.7/threading.py", line 917 in _bootstrap_inner File "/media/nas/x21324/miniconda3/envs/py37d/lib/python3.7/threading.py", line 885 in _bootstrap

Thread 0x00007f54497fa700 (most recent call first): File "/media/nas/x21324/miniconda3/envs/py37d/lib/python3.7/multiprocessing/pool.py", line 413 in _handle_workers File "/media/nas/x21324/miniconda3/envs/py37d/lib/python3.7/threading.py", line 865 in run File "/media/nas/x21324/miniconda3/envs/py37d/lib/python3.7/threading.py", line 917 in _bootstrap_inner File "/media/nas/x21324/miniconda3/envs/py37d/lib/python3.7/threading.py", line 885 in _bootstrap

Thread 0x00007f5449ffb700 (most recent call first): File "/media/nas/x21324/miniconda3/envs/py37d/lib/python3.7/multiprocessing/pool.py", line 110 in worker File "/media/nas/x21324/miniconda3/envs/py37d/lib/python3.7/threading.py", line 865 in run File "/media/nas/x21324/miniconda3/envs/py37d/lib/python3.7/threading.py", line 917 in _bootstrap_inner File "/media/nas/x21324/miniconda3/envs/py37d/lib/python3.7/threading.py", line 885 in _bootstrap

Thread 0x00007f544a7fc700 (most recent call first): File "/media/nas/x21324/miniconda3/envs/py37d/lib/python3.7/multiprocessing/pool.py", line 110 in worker File "/media/nas/x21324/miniconda3/envs/py37d/lib/python3.7/threading.py", line 865 in run File "/media/nas/x21324/miniconda3/envs/py37d/lib/python3.7/threading.py", line 917 in _bootstrap_inner File "/media/nas/x21324/miniconda3/envs/py37d/lib/python3.7/threading.py", line 885 in _bootstrap

Thread 0x00007f544affd700 (most recent call first): File "/media/nas/x21324/miniconda3/envs/py37d/lib/python3.7/multiprocessing/pool.py", line 110 in worker File "/media/nas/x21324/miniconda3/envs/py37d/lib/python3.7/threading.py", line 865 in run File "/media/nas/x21324/miniconda3/envs/py37d/lib/python3.7/threading.py", line 917 in _bootstrap_inner File "/media/nas/x21324/miniconda3/envs/py37d/lib/python3.7/threading.py", line 885 in _bootstrap

Thread 0x00007f544b7fe700 (most recent call first): File "/media/nas/x21324/miniconda3/envs/py37d/lib/python3.7/multiprocessing/pool.py", line 110 in worker File "/media/nas/x21324/miniconda3/envs/py37d/lib/python3.7/threading.py", line 865 in run File "/media/nas/x21324/miniconda3/envs/py37d/lib/python3.7/threading.py", line 917 in _bootstrap_inner File "/media/nas/x21324/miniconda3/envs/py37d/lib/python3.7/threading.py", line 885 in _bootstrap

Thread 0x00007f544bfff700 (most recent call first): File "/media/nas/x21324/miniconda3/envs/py37d/lib/python3.7/multiprocessing/pool.py", line 110 in worker File "/media/nas/x21324/miniconda3/envs/py37d/lib/python3.7/threading.py", line 865 in run File "/media/nas/x21324/miniconda3/envs/py37d/lib/python3.7/threading.py", line 917 in _bootstrap_inner File "/media/nas/x21324/miniconda3/envs/py37d/lib/python3.7/threading.py", line 885 in _bootstrap

Thread 0x00007f5458a75700 (most recent call first): File "/media/nas/x21324/miniconda3/envs/py37d/lib/python3.7/multiprocessing/pool.py", line 110 in worker File "/media/nas/x21324/miniconda3/envs/py37d/lib/python3.7/threading.py", line 865 in run File "/media/nas/x21324/miniconda3/envs/py37d/lib/python3.7/threading.py", line 917 in _bootstrap_inner File "/media/nas/x21324/miniconda3/envs/py37d/lib/python3.7/threading.py", line 885 in _bootstrap

Thread 0x00007f5459276700 (most recent call first): File "/media/nas/x21324/miniconda3/envs/py37d/lib/python3.7/multiprocessing/pool.py", line 110 in worker File "/media/nas/x21324/miniconda3/envs/py37d/lib/python3.7/threading.py", line 865 in run File "/media/nas/x21324/miniconda3/envs/py37d/lib/python3.7/threading.py", line 917 in _bootstrap_inner File "/media/nas/x21324/miniconda3/envs/py37d/lib/python3.7/threading.py", line 885 in _bootstrap

Thread 0x00007f5459a77700 (most recent call first): File "/media/nas/x21324/miniconda3/envs/py37d/lib/python3.7/multiprocessing/pool.py", line 110 in worker File "/media/nas/x21324/miniconda3/envs/py37d/lib/python3.7/threading.py", line 865 in run File "/media/nas/x21324/miniconda3/envs/py37d/lib/python3.7/threading.py", line 917 in _bootstrap_inner File "/media/nas/x21324/miniconda3/envs/py37d/lib/python3.7/threading.py", line 885 in _bootstrap

Current thread 0x00007f54731236c0 (most recent call first): File "/media/nas/x21324/miniconda3/envs/py37d/lib/python3.7/site-packages/xarray/backends/netCDF4_.py", line 244 in open_netcdf4_group File "/media/nas/x21324/miniconda3/envs/py37d/lib/python3.7/site-packages/xarray/backends/file_manager.py", line 173 in acquire File "/media/nas/x21324/miniconda3/envs/py37d/lib/python3.7/site-packages/xarray/backends/netCDF4.py", line 56 in get_array File "/media/nas/x21324/miniconda3/envs/py37d/lib/python3.7/site-packages/xarray/backends/netCDF4_.py", line 74 in getitem File "/media/nas/x21324/miniconda3/envs/py37d/lib/python3.7/site-packages/xarray/core/indexing.py", line 778 in explicit_indexing_adapter File "/media/nas/x21324/miniconda3/envs/py37d/lib/python3.7/site-packages/xarray/backends/netCDF4.py", line 64 in getitem File "/media/nas/x21324/miniconda3/envs/py37d/lib/python3.7/site-packages/xarray/core/indexing.py", line 510 in array File "/media/nas/x21324/miniconda3/envs/py37d/lib/python3.7/site-packages/numpy/core/numeric.py", line 538 in asarray File "/media/nas/x21324/miniconda3/envs/py37d/lib/python3.7/site-packages/xarray/core/indexing.py", line 604 in array File "/media/nas/x21324/miniconda3/envs/py37d/lib/python3.7/site-packages/numpy/core/numeric.py", line 538 in asarray File "/media/nas/x21324/miniconda3/envs/py37d/lib/python3.7/site-packages/xarray/core/variable.py", line 213 in _as_array_or_item File "/media/nas/x21324/miniconda3/envs/py37d/lib/python3.7/site-packages/xarray/core/variable.py", line 392 in values File "/media/nas/x21324/miniconda3/envs/py37d/lib/python3.7/site-packages/xarray/core/variable.py", line 297 in data File "/media/nas/x21324/miniconda3/envs/py37d/lib/python3.7/site-packages/xarray/core/variable.py", line 1204 in set_dims File "/media/nas/x21324/miniconda3/envs/py37d/lib/python3.7/site-packages/xarray/core/combine.py", line 298 in ensure_common_dims File "/media/nas/x21324/miniconda3/envs/py37d/lib/python3.7/site-packages/xarray/core/variable.py", line 2085 in concat File "/media/nas/x21324/miniconda3/envs/py37d/lib/python3.7/site-packages/xarray/core/combine.py", line 305 in _dataset_concat File "/media/nas/x21324/miniconda3/envs/py37d/lib/python3.7/site-packages/xarray/core/combine.py", line 120 in concat File "mwe13.py", line 19 in <module> Segmentation fault (core dumped) ```

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Segmentation fault reading many groups from many files 442617907
508728959 https://github.com/pydata/xarray/issues/2954#issuecomment-508728959 https://api.github.com/repos/pydata/xarray/issues/2954 MDEyOklzc3VlQ29tbWVudDUwODcyODk1OQ== gerritholl 500246 2019-07-05T11:29:50Z 2019-07-05T11:29:50Z CONTRIBUTOR

This can also be triggered by a .persist(...) call, although I don't yet understand the precise circumstances.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Segmentation fault reading many groups from many files 442617907
492837199 https://github.com/pydata/xarray/issues/2954#issuecomment-492837199 https://api.github.com/repos/pydata/xarray/issues/2954 MDEyOklzc3VlQ29tbWVudDQ5MjgzNzE5OQ== djhoese 1828519 2019-05-15T21:51:26Z 2019-05-15T21:51:39Z CONTRIBUTOR

Would it be better if we raised an error in these cases, when you later try to access data from a file that was explicitly closed?

I would prefer if it stayed the way it is. I can use the context manager to access specific variables but still hold on to the DataArray objects with dask arrays underneath and use them later. In the non-dask case, I'm not sure.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Segmentation fault reading many groups from many files 442617907
492732052 https://github.com/pydata/xarray/issues/2954#issuecomment-492732052 https://api.github.com/repos/pydata/xarray/issues/2954 MDEyOklzc3VlQ29tbWVudDQ5MjczMjA1Mg== shoyer 1217238 2019-05-15T16:43:17Z 2019-05-15T16:43:17Z MEMBER

is not closing the file after it has been opened for retrieving a "lazy" file by design, or might this be considered a wart/bug?

You can achieve this behavior (nearly) by setting xarray.set_options(file_cache_maxsize=1).

Note that the default for file_cache_maxsize is 128, which is suspiciously similar to the number of files/groups at which you encounter issues. In theory we use appropriate locks for automatically closing files when the cache size is exceeded, but this may not be working properly. If you can make a test case with synthetic data (e.g., including a script to make files) I can see if I can reproduce/fix this.

But to clarify the intent here: we don't close files around every access to data because can cause a severe loss in performance, e.g., if you're using dask to read a bunch of chunks out of the same file.

I agree that it's unintuitive how we ignore the explicit context manager. Would it be better if we raised an error in these cases, when you later try to access data from a file that was explicitly closed? It's not immediately obvious to me how to refactor the code to achieve this, but this does seem like it would make for a better user experience.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Segmentation fault reading many groups from many files 442617907
492509798 https://github.com/pydata/xarray/issues/2954#issuecomment-492509798 https://api.github.com/repos/pydata/xarray/issues/2954 MDEyOklzc3VlQ29tbWVudDQ5MjUwOTc5OA== shoyer 1217238 2019-05-15T05:32:16Z 2019-05-15T05:32:16Z MEMBER

Nevermind, I think we do properly use the right locks. But perhaps there is an issue with re-using open files when using netCDF4/HDF5 groups.

Does this same issue appear if you use engine='h5netcdf'? That would be an interesting data point.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Segmentation fault reading many groups from many files 442617907
492507869 https://github.com/pydata/xarray/issues/2954#issuecomment-492507869 https://api.github.com/repos/pydata/xarray/issues/2954 MDEyOklzc3VlQ29tbWVudDQ5MjUwNzg2OQ== shoyer 1217238 2019-05-15T05:22:24Z 2019-05-15T05:22:24Z MEMBER

Looking through the code for open_dataset() it appears that we have a bug: by default we don't file locks! (We do use these by default for open_mfdataset().) This should really be fixed, I will try to make a pull request shortly.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Segmentation fault reading many groups from many files 442617907
491866549 https://github.com/pydata/xarray/issues/2954#issuecomment-491866549 https://api.github.com/repos/pydata/xarray/issues/2954 MDEyOklzc3VlQ29tbWVudDQ5MTg2NjU0OQ== gerritholl 500246 2019-05-13T15:18:33Z 2019-05-13T15:18:33Z CONTRIBUTOR

In our code, this problem gets triggered because of xarrays lazy handling. If we have

with xr.open_dataset('file.nc') as ds: val = ds["field"] return val

then when a caller tries to use val, xarray reopens the dataset and does not close it again. This means the context manager is actually useless: we're using the context manager to close the file as soon as we have accessed the value, but later the file gets opened again anyway. This is against the intention of the code.

We can avoid this by calling val.load() from within the context manager, as the linked satpy PR above does. But what is the intention of xarrays design here? Should lazy reading close the file after opening and reading the value? I would say it probably should do something like

if file_was_not_open: open file get value close file # this step currently omitted return value else: get value return value

is not closing the file after it has been opened for retrieving a "lazy" file by design, or might this be considered a wart/bug?

{
    "total_count": 2,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 2
}
  Segmentation fault reading many groups from many files 442617907
491221266 https://github.com/pydata/xarray/issues/2954#issuecomment-491221266 https://api.github.com/repos/pydata/xarray/issues/2954 MDEyOklzc3VlQ29tbWVudDQ5MTIyMTI2Ng== gerritholl 500246 2019-05-10T09:18:28Z 2019-05-10T09:18:28Z CONTRIBUTOR

Note that if I close every file neatly, there is no segmentation fault.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Segmentation fault reading many groups from many files 442617907

Advanced export

JSON shape: default, array, newline-delimited, object

CSV options:

CREATE TABLE [issue_comments] (
   [html_url] TEXT,
   [issue_url] TEXT,
   [id] INTEGER PRIMARY KEY,
   [node_id] TEXT,
   [user] INTEGER REFERENCES [users]([id]),
   [created_at] TEXT,
   [updated_at] TEXT,
   [author_association] TEXT,
   [body] TEXT,
   [reactions] TEXT,
   [performed_via_github_app] TEXT,
   [issue] INTEGER REFERENCES [issues]([id])
);
CREATE INDEX [idx_issue_comments_issue]
    ON [issue_comments] ([issue]);
CREATE INDEX [idx_issue_comments_user]
    ON [issue_comments] ([user]);
Powered by Datasette · Queries took 17.68ms · About: xarray-datasette