home / github

Menu
  • GraphQL API
  • Search all tables

issue_comments

Table actions
  • GraphQL API for issue_comments

6 rows where issue = 626042217 sorted by updated_at descending

✎ View and edit SQL

This data as json, CSV (advanced)

Suggested facets: reactions, created_at (date), updated_at (date)

user 3

  • shoyer 3
  • paulkernfeld 2
  • stale[bot] 1

author_association 2

  • MEMBER 3
  • NONE 3

issue 1

  • open_dataset is not thread-safe · 6 ✖
id html_url issue_url node_id user created_at updated_at ▲ author_association body reactions performed_via_github_app issue
1102540862 https://github.com/pydata/xarray/issues/4100#issuecomment-1102540862 https://api.github.com/repos/pydata/xarray/issues/4100 IC_kwDOAMm_X85Bt3A- stale[bot] 26384082 2022-04-19T11:43:48Z 2022-04-19T11:43:48Z NONE

In order to maintain a list of currently relevant issues, we mark issues as stale after a period of inactivity

If this issue remains relevant, please comment here or remove the stale label; otherwise it will be marked as closed automatically

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  open_dataset is not thread-safe 626042217
635586295 https://github.com/pydata/xarray/issues/4100#issuecomment-635586295 https://api.github.com/repos/pydata/xarray/issues/4100 MDEyOklzc3VlQ29tbWVudDYzNTU4NjI5NQ== shoyer 1217238 2020-05-28T20:21:44Z 2020-05-28T20:21:44Z MEMBER

Take a look here: https://portal.hdfgroup.org/display/knowledge/Questions+about+thread-safety+and+concurrent+access

I haven't actually tried compiling in thread-safe mode myself

{
    "total_count": 1,
    "+1": 1,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  open_dataset is not thread-safe 626042217
635564640 https://github.com/pydata/xarray/issues/4100#issuecomment-635564640 https://api.github.com/repos/pydata/xarray/issues/4100 MDEyOklzc3VlQ29tbWVudDYzNTU2NDY0MA== paulkernfeld 433803 2020-05-28T20:00:00Z 2020-05-28T20:00:00Z NONE

@shoyer could you tell me more about what it means to compile HDF5 in "thread safe" mode?

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  open_dataset is not thread-safe 626042217
635560237 https://github.com/pydata/xarray/issues/4100#issuecomment-635560237 https://api.github.com/repos/pydata/xarray/issues/4100 MDEyOklzc3VlQ29tbWVudDYzNTU2MDIzNw== paulkernfeld 433803 2020-05-28T19:50:56Z 2020-05-28T19:50:56Z NONE

Hey @shoyer, thanks very much for the quick response and for suggesting possible workarounds!

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  open_dataset is not thread-safe 626042217
635102475 https://github.com/pydata/xarray/issues/4100#issuecomment-635102475 https://api.github.com/repos/pydata/xarray/issues/4100 MDEyOklzc3VlQ29tbWVudDYzNTEwMjQ3NQ== shoyer 1217238 2020-05-28T05:03:30Z 2020-05-28T05:03:30Z MEMBER

There are also a few work-arounds you might consider in the meantime here:

  1. If you're reading netCDF4 files, HDF5 can be compiled in "thread safe" mode (which just adds its own global lock).
  2. If you're reading netCDF3 files, the "scipy" backend is thread safe.
  3. Other file formats like "zarr" don't have this issue at all, and more gracefully scale to very large datasets.
{
    "total_count": 1,
    "+1": 1,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  open_dataset is not thread-safe 626042217
635099870 https://github.com/pydata/xarray/issues/4100#issuecomment-635099870 https://api.github.com/repos/pydata/xarray/issues/4100 MDEyOklzc3VlQ29tbWVudDYzNTA5OTg3MA== shoyer 1217238 2020-05-28T04:55:01Z 2020-05-28T04:55:01Z MEMBER

Thanks for the clear report!

I know we use backend-specific locks by default when opening netCDF files, so I was initially puzzled by this. But now that I've looked back over the implementation, this makes sense.

We currently only guarantee thread safety when reading data after files have been opened. For example, you could write something like: python dataset = xr.open_dataset(SAVED_FILE_NAME, engine="netcdf4") threads = [ threading.Thread(target=lambda: do_something_with_xarray(dataset)) for _ in range(N_THREADS) ]

For many use-cases (e.g., in dask), this is a sufficient form of parallelism, because xarray's file opening is lazy and only needs to read metadata, not array values.

It would indeed be nice if open_dataset() itself were thread safe. Mostly I think this could be achieved by making use of the existing lock attribute found on NetCDF4DataStore and most other DataStore classes.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  open_dataset is not thread-safe 626042217

Advanced export

JSON shape: default, array, newline-delimited, object

CSV options:

CREATE TABLE [issue_comments] (
   [html_url] TEXT,
   [issue_url] TEXT,
   [id] INTEGER PRIMARY KEY,
   [node_id] TEXT,
   [user] INTEGER REFERENCES [users]([id]),
   [created_at] TEXT,
   [updated_at] TEXT,
   [author_association] TEXT,
   [body] TEXT,
   [reactions] TEXT,
   [performed_via_github_app] TEXT,
   [issue] INTEGER REFERENCES [issues]([id])
);
CREATE INDEX [idx_issue_comments_issue]
    ON [issue_comments] ([issue]);
CREATE INDEX [idx_issue_comments_user]
    ON [issue_comments] ([user]);
Powered by Datasette · Queries took 13.69ms · About: xarray-datasette