home / github / issue_comments

Menu
  • Search all tables
  • GraphQL API

issue_comments: 635099870

This data as json

html_url issue_url id node_id user created_at updated_at author_association body reactions performed_via_github_app issue
https://github.com/pydata/xarray/issues/4100#issuecomment-635099870 https://api.github.com/repos/pydata/xarray/issues/4100 635099870 MDEyOklzc3VlQ29tbWVudDYzNTA5OTg3MA== 1217238 2020-05-28T04:55:01Z 2020-05-28T04:55:01Z MEMBER

Thanks for the clear report!

I know we use backend-specific locks by default when opening netCDF files, so I was initially puzzled by this. But now that I've looked back over the implementation, this makes sense.

We currently only guarantee thread safety when reading data after files have been opened. For example, you could write something like: python dataset = xr.open_dataset(SAVED_FILE_NAME, engine="netcdf4") threads = [ threading.Thread(target=lambda: do_something_with_xarray(dataset)) for _ in range(N_THREADS) ]

For many use-cases (e.g., in dask), this is a sufficient form of parallelism, because xarray's file opening is lazy and only needs to read metadata, not array values.

It would indeed be nice if open_dataset() itself were thread safe. Mostly I think this could be achieved by making use of the existing lock attribute found on NetCDF4DataStore and most other DataStore classes.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  626042217
Powered by Datasette · Queries took 7.615ms · About: xarray-datasette