home / github

Menu
  • GraphQL API
  • Search all tables

issue_comments

Table actions
  • GraphQL API for issue_comments

6 rows where author_association = "MEMBER", issue = 442617907 and user = 1217238 sorted by updated_at descending

✎ View and edit SQL

This data as json, CSV (advanced)

user 1

  • shoyer · 6 ✖

issue 1

  • Segmentation fault reading many groups from many files · 6 ✖

author_association 1

  • MEMBER · 6 ✖
id html_url issue_url node_id user created_at updated_at ▲ author_association body reactions performed_via_github_app issue
508873420 https://github.com/pydata/xarray/issues/2954#issuecomment-508873420 https://api.github.com/repos/pydata/xarray/issues/2954 MDEyOklzc3VlQ29tbWVudDUwODg3MzQyMA== shoyer 1217238 2019-07-05T22:29:01Z 2019-07-05T22:29:01Z MEMBER

OK, I have a tentative fix up in https://github.com/pydata/xarray/pull/3082.

@gerritholl I have not been able to directly reproduce this issue, so it would be great if you could test my pull request before we merge it to verify whether or not the fix works.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Segmentation fault reading many groups from many files 442617907
508857913 https://github.com/pydata/xarray/issues/2954#issuecomment-508857913 https://api.github.com/repos/pydata/xarray/issues/2954 MDEyOklzc3VlQ29tbWVudDUwODg1NzkxMw== shoyer 1217238 2019-07-05T20:39:56Z 2019-07-05T20:39:56Z MEMBER

Thinking about this a little more, I suspect the issue might be related to how xarray opens a file multiple times to read different groups. It is very likely that libraries like netCDF-C don't handle this properly. Instead, we should probably open files once, and reuse them for reading from different groups.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Segmentation fault reading many groups from many files 442617907
508853908 https://github.com/pydata/xarray/issues/2954#issuecomment-508853908 https://api.github.com/repos/pydata/xarray/issues/2954 MDEyOklzc3VlQ29tbWVudDUwODg1MzkwOA== shoyer 1217238 2019-07-05T20:17:22Z 2019-07-05T20:17:22Z MEMBER

But there's something with the specific netcdf file going on, for when I create artificial groups, it does not segfault.

Can you share a netCDF file that causes this issue?

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Segmentation fault reading many groups from many files 442617907
492732052 https://github.com/pydata/xarray/issues/2954#issuecomment-492732052 https://api.github.com/repos/pydata/xarray/issues/2954 MDEyOklzc3VlQ29tbWVudDQ5MjczMjA1Mg== shoyer 1217238 2019-05-15T16:43:17Z 2019-05-15T16:43:17Z MEMBER

is not closing the file after it has been opened for retrieving a "lazy" file by design, or might this be considered a wart/bug?

You can achieve this behavior (nearly) by setting xarray.set_options(file_cache_maxsize=1).

Note that the default for file_cache_maxsize is 128, which is suspiciously similar to the number of files/groups at which you encounter issues. In theory we use appropriate locks for automatically closing files when the cache size is exceeded, but this may not be working properly. If you can make a test case with synthetic data (e.g., including a script to make files) I can see if I can reproduce/fix this.

But to clarify the intent here: we don't close files around every access to data because can cause a severe loss in performance, e.g., if you're using dask to read a bunch of chunks out of the same file.

I agree that it's unintuitive how we ignore the explicit context manager. Would it be better if we raised an error in these cases, when you later try to access data from a file that was explicitly closed? It's not immediately obvious to me how to refactor the code to achieve this, but this does seem like it would make for a better user experience.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Segmentation fault reading many groups from many files 442617907
492509798 https://github.com/pydata/xarray/issues/2954#issuecomment-492509798 https://api.github.com/repos/pydata/xarray/issues/2954 MDEyOklzc3VlQ29tbWVudDQ5MjUwOTc5OA== shoyer 1217238 2019-05-15T05:32:16Z 2019-05-15T05:32:16Z MEMBER

Nevermind, I think we do properly use the right locks. But perhaps there is an issue with re-using open files when using netCDF4/HDF5 groups.

Does this same issue appear if you use engine='h5netcdf'? That would be an interesting data point.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Segmentation fault reading many groups from many files 442617907
492507869 https://github.com/pydata/xarray/issues/2954#issuecomment-492507869 https://api.github.com/repos/pydata/xarray/issues/2954 MDEyOklzc3VlQ29tbWVudDQ5MjUwNzg2OQ== shoyer 1217238 2019-05-15T05:22:24Z 2019-05-15T05:22:24Z MEMBER

Looking through the code for open_dataset() it appears that we have a bug: by default we don't file locks! (We do use these by default for open_mfdataset().) This should really be fixed, I will try to make a pull request shortly.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Segmentation fault reading many groups from many files 442617907

Advanced export

JSON shape: default, array, newline-delimited, object

CSV options:

CREATE TABLE [issue_comments] (
   [html_url] TEXT,
   [issue_url] TEXT,
   [id] INTEGER PRIMARY KEY,
   [node_id] TEXT,
   [user] INTEGER REFERENCES [users]([id]),
   [created_at] TEXT,
   [updated_at] TEXT,
   [author_association] TEXT,
   [body] TEXT,
   [reactions] TEXT,
   [performed_via_github_app] TEXT,
   [issue] INTEGER REFERENCES [issues]([id])
);
CREATE INDEX [idx_issue_comments_issue]
    ON [issue_comments] ([issue]);
CREATE INDEX [idx_issue_comments_user]
    ON [issue_comments] ([user]);
Powered by Datasette · Queries took 5121.375ms · About: xarray-datasette