home / github / issue_comments

Menu
  • GraphQL API
  • Search all tables

issue_comments: 196924992

This data as json

html_url issue_url id node_id user created_at updated_at author_association body reactions performed_via_github_app issue
https://github.com/pydata/xarray/issues/793#issuecomment-196924992 https://api.github.com/repos/pydata/xarray/issues/793 196924992 MDEyOklzc3VlQ29tbWVudDE5NjkyNDk5Mg== 1217238 2016-03-15T17:04:57Z 2016-03-15T17:27:29Z MEMBER

I did a little digging into this and I'm pretty sure the issue here is that HDF5 cannot do multi-threading -- at all. Moreover, many HDF5 builds are not thread safe.

Right now, we use a single shared lock for all reads with xarray, but for writes we rely on dask.array.store, which only uses different locks for each array it writes. Because @pwolfram's HDF5 file includes multiple variables, each of these gets written with their own thread lock -- which means we end up writing to the same file simultaneously from multiple threads.

So what we could really use here is a lock argument to dask.array.store (like dask.array.from_array) that lets us insist on a using a shared lock when we're writing HDF5 files. Also, we may need to share that same lock between reading and writing data -- I'm not 100% sure. But at the very least we definitely need a lock to stop HDF5 from trying to do multi-threaded writes, whether that's to the same or different files.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  140291221
Powered by Datasette · Queries took 1.167ms · About: xarray-datasette