home / github

Menu
  • Search all tables
  • GraphQL API

issue_comments

Table actions
  • GraphQL API for issue_comments

3 rows where author_association = "MEMBER" and issue = 626042217 sorted by updated_at descending

✖
✖
✖

✎ View and edit SQL

This data as json, CSV (advanced)

Suggested facets: reactions, created_at (date), updated_at (date)

user 1

  • shoyer 3

issue 1

  • open_dataset is not thread-safe · 3 ✖

author_association 1

  • MEMBER · 3 ✖
id html_url issue_url node_id user created_at updated_at ▲ author_association body reactions performed_via_github_app issue
635586295 https://github.com/pydata/xarray/issues/4100#issuecomment-635586295 https://api.github.com/repos/pydata/xarray/issues/4100 MDEyOklzc3VlQ29tbWVudDYzNTU4NjI5NQ== shoyer 1217238 2020-05-28T20:21:44Z 2020-05-28T20:21:44Z MEMBER

Take a look here: https://portal.hdfgroup.org/display/knowledge/Questions+about+thread-safety+and+concurrent+access

I haven't actually tried compiling in thread-safe mode myself

{
    "total_count": 1,
    "+1": 1,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  open_dataset is not thread-safe 626042217
635102475 https://github.com/pydata/xarray/issues/4100#issuecomment-635102475 https://api.github.com/repos/pydata/xarray/issues/4100 MDEyOklzc3VlQ29tbWVudDYzNTEwMjQ3NQ== shoyer 1217238 2020-05-28T05:03:30Z 2020-05-28T05:03:30Z MEMBER

There are also a few work-arounds you might consider in the meantime here:

  1. If you're reading netCDF4 files, HDF5 can be compiled in "thread safe" mode (which just adds its own global lock).
  2. If you're reading netCDF3 files, the "scipy" backend is thread safe.
  3. Other file formats like "zarr" don't have this issue at all, and more gracefully scale to very large datasets.
{
    "total_count": 1,
    "+1": 1,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  open_dataset is not thread-safe 626042217
635099870 https://github.com/pydata/xarray/issues/4100#issuecomment-635099870 https://api.github.com/repos/pydata/xarray/issues/4100 MDEyOklzc3VlQ29tbWVudDYzNTA5OTg3MA== shoyer 1217238 2020-05-28T04:55:01Z 2020-05-28T04:55:01Z MEMBER

Thanks for the clear report!

I know we use backend-specific locks by default when opening netCDF files, so I was initially puzzled by this. But now that I've looked back over the implementation, this makes sense.

We currently only guarantee thread safety when reading data after files have been opened. For example, you could write something like: python dataset = xr.open_dataset(SAVED_FILE_NAME, engine="netcdf4") threads = [ threading.Thread(target=lambda: do_something_with_xarray(dataset)) for _ in range(N_THREADS) ]

For many use-cases (e.g., in dask), this is a sufficient form of parallelism, because xarray's file opening is lazy and only needs to read metadata, not array values.

It would indeed be nice if open_dataset() itself were thread safe. Mostly I think this could be achieved by making use of the existing lock attribute found on NetCDF4DataStore and most other DataStore classes.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  open_dataset is not thread-safe 626042217

Advanced export

JSON shape: default, array, newline-delimited, object

CSV options:

CREATE TABLE [issue_comments] (
   [html_url] TEXT,
   [issue_url] TEXT,
   [id] INTEGER PRIMARY KEY,
   [node_id] TEXT,
   [user] INTEGER REFERENCES [users]([id]),
   [created_at] TEXT,
   [updated_at] TEXT,
   [author_association] TEXT,
   [body] TEXT,
   [reactions] TEXT,
   [performed_via_github_app] TEXT,
   [issue] INTEGER REFERENCES [issues]([id])
);
CREATE INDEX [idx_issue_comments_issue]
    ON [issue_comments] ([issue]);
CREATE INDEX [idx_issue_comments_user]
    ON [issue_comments] ([user]);
Powered by Datasette · Queries took 4160.254ms · About: xarray-datasette
  • Sort ascending
  • Sort descending
  • Facet by this
  • Hide this column
  • Show all columns
  • Show not-blank rows