html_url,issue_url,id,node_id,user,created_at,updated_at,author_association,body,reactions,performed_via_github_app,issue
https://github.com/pydata/xarray/issues/444#issuecomment-120447670,https://api.github.com/repos/pydata/xarray/issues/444,120447670,MDEyOklzc3VlQ29tbWVudDEyMDQ0NzY3MA==,1217238,2015-07-10T16:11:19Z,2015-07-10T16:11:19Z,MEMBER,"@razvanc87 I've gotten a few other reports of issues with multithreading (not just you), so I think we do definitely need to add our own lock when accessing these files. Misconfigured hdf5 installs may not be so uncommon.
","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,91184107
https://github.com/pydata/xarray/issues/444#issuecomment-118435615,https://api.github.com/repos/pydata/xarray/issues/444,118435615,MDEyOklzc3VlQ29tbWVudDExODQzNTYxNQ==,1217238,2015-07-03T22:43:41Z,2015-07-03T22:43:41Z,MEMBER,"@razvanc87 netcdf4 and h5py use the same HDF5 libraries, but have different bindings from Python. H5py likely does a more careful job of using its own locks to ensure thread safety, which likely explains the difference you are seeing (the attribute encoding is a separate issue).
","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,91184107
https://github.com/pydata/xarray/issues/444#issuecomment-118435484,https://api.github.com/repos/pydata/xarray/issues/444,118435484,MDEyOklzc3VlQ29tbWVudDExODQzNTQ4NA==,1217238,2015-07-03T22:40:57Z,2015-07-03T22:40:57Z,MEMBER,"> The library itself is not threadsafe? What about on a per-file basis?
@andrewcollette could you comment on this for h5py/hdf5?
@mrocklin based on my reading of Andrew's comment in the h5py issue, this is indeed the case.
","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,91184107
https://github.com/pydata/xarray/issues/444#issuecomment-118195247,https://api.github.com/repos/pydata/xarray/issues/444,118195247,MDEyOklzc3VlQ29tbWVudDExODE5NTI0Nw==,1217238,2015-07-02T23:45:01Z,2015-07-02T23:45:01Z,MEMBER,"Ah, I think I know why the seg faults are still occuring. By default, `dask.array.from_array` uses a thread lock that is specific to each array variable. We need a global thread lock, because the HDF5 library is not thread safe.
@mrocklin maybe `da.from_array` should use a global thread lock if `lock=True`? Alternatively, I could just change this in xray -- but I suspect that other dask users who want a lock also probably want a global lock.
","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,91184107
https://github.com/pydata/xarray/issues/444#issuecomment-118090209,https://api.github.com/repos/pydata/xarray/issues/444,118090209,MDEyOklzc3VlQ29tbWVudDExODA5MDIwOQ==,1217238,2015-07-02T16:46:57Z,2015-07-02T16:46:57Z,MEMBER,"Thanks for your help debugging!
I made a new issue for ascii attributes handling: https://github.com/xray/xray/issues/451
This is one case where Python 3's insistence that bytes and strings are different is annoying. I'll probably have to decode all bytes type attributes read from h5netcdf.
How do you trigger the seg-fault with netcdf4-python? Just using `open_mfdataset` as before? I'm a little surprised that still happens with the thread lock.
","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,91184107
https://github.com/pydata/xarray/issues/444#issuecomment-116787098,https://api.github.com/repos/pydata/xarray/issues/444,116787098,MDEyOklzc3VlQ29tbWVudDExNjc4NzA5OA==,1217238,2015-06-29T18:30:48Z,2015-06-29T18:30:48Z,MEMBER,"@razvanc87 What version of h5py were you using with h5netcdf? @andrewcollette suggests (https://github.com/h5py/h5py/issues/591#issuecomment-116785660) that h5py should already have the lock that fixes this issue if you were using h5py 2.4.0 or later.
","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,91184107
https://github.com/pydata/xarray/issues/444#issuecomment-116779716,https://api.github.com/repos/pydata/xarray/issues/444,116779716,MDEyOklzc3VlQ29tbWVudDExNjc3OTcxNg==,1217238,2015-06-29T18:07:52Z,2015-06-29T18:07:52Z,MEMBER,"Just merged the fix to master.
@razvanc87 if you could try installing the development version, I would love to hear if this resolves your issues.
","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,91184107
https://github.com/pydata/xarray/issues/444#issuecomment-116189535,https://api.github.com/repos/pydata/xarray/issues/444,116189535,MDEyOklzc3VlQ29tbWVudDExNjE4OTUzNQ==,1217238,2015-06-28T03:34:30Z,2015-06-28T03:34:30Z,MEMBER,"I have a tentative fix (adding the threading lock) in https://github.com/xray/xray/pull/446
Still wondering why multi-threading can't use more than one CPU -- hopefully my h5py issue (referenced above) will get us some answers.
","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,91184107
https://github.com/pydata/xarray/issues/444#issuecomment-116165986,https://api.github.com/repos/pydata/xarray/issues/444,116165986,MDEyOklzc3VlQ29tbWVudDExNjE2NTk4Ng==,1217238,2015-06-27T23:40:29Z,2015-06-27T23:40:29Z,MEMBER,"Of course, concurrent access to HDF5 files works fine on my laptop, using Anaconda's build of HDF5 (version 1.8.14). I have no idea what special flags they invoked when building it :).
That said, I have been unable to produce any benchmarks that show improved performance when simply doing multithreaded _reads_ without doing any computation (e.g., `%time xray.open_dataset(..., chunks=...).load()`). Even when I'm reading multiple independent chunks compressed on disk, CPU seems to be pegged at 100%, when using either netCDF4-python or h5py (via h5netcdf) to read the data. For non-compressed data, reads seem to be limited by disk speed, so CPU is also not relevant.
Given these considerations, it seems like we should use a lock when reading data into xray with dask. @mrocklin we could just use `lock=True` with `da.from_array`, right? If we can find use cases for multi-threaded reads, we could also add an optional `lock` argument to `open_dataset`/`open_mfdataset`.
","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,91184107
https://github.com/pydata/xarray/issues/444#issuecomment-115925776,https://api.github.com/repos/pydata/xarray/issues/444,115925776,MDEyOklzc3VlQ29tbWVudDExNTkyNTc3Ng==,1217238,2015-06-27T00:49:19Z,2015-06-27T00:49:19Z,MEMBER,"do you have an example file? this might also be your HDF5 install....
","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,91184107
https://github.com/pydata/xarray/issues/444#issuecomment-115902800,https://api.github.com/repos/pydata/xarray/issues/444,115902800,MDEyOklzc3VlQ29tbWVudDExNTkwMjgwMA==,1217238,2015-06-26T22:01:41Z,2015-06-26T22:01:41Z,MEMBER,"Another backend to try would be `engine='h5netcdf'`: https://github.com/shoyer/h5netcdf
That might help us identify if this is a netCDF4-python bug.
I am also baffled by how inserting `isnull(arr1 & arr2)` avoids the seg fault. This is a lazy computation created with dask that is immediately thrown away without accessing any of the values.
","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,91184107
https://github.com/pydata/xarray/issues/444#issuecomment-115887568,https://api.github.com/repos/pydata/xarray/issues/444,115887568,MDEyOklzc3VlQ29tbWVudDExNTg4NzU2OA==,1217238,2015-06-26T21:25:50Z,2015-06-26T21:25:50Z,MEMBER,"Oh my, that's bad!
Can you experiment with the `engine` argument to `open_mfdataset` and see if that changes things? For example, try `engine='scipy'` (if this is a netcdf3 files) and `engine='netcdf4'`.
It would be also be helpful to report the dtypes of the arrays that trigger failure in `array_equiv`.
","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,91184107