home / github / issue_comments

Menu
  • GraphQL API
  • Search all tables

issue_comments: 116165986

This data as json

html_url issue_url id node_id user created_at updated_at author_association body reactions performed_via_github_app issue
https://github.com/pydata/xarray/issues/444#issuecomment-116165986 https://api.github.com/repos/pydata/xarray/issues/444 116165986 MDEyOklzc3VlQ29tbWVudDExNjE2NTk4Ng== 1217238 2015-06-27T23:40:29Z 2015-06-27T23:40:29Z MEMBER

Of course, concurrent access to HDF5 files works fine on my laptop, using Anaconda's build of HDF5 (version 1.8.14). I have no idea what special flags they invoked when building it :).

That said, I have been unable to produce any benchmarks that show improved performance when simply doing multithreaded reads without doing any computation (e.g., %time xray.open_dataset(..., chunks=...).load()). Even when I'm reading multiple independent chunks compressed on disk, CPU seems to be pegged at 100%, when using either netCDF4-python or h5py (via h5netcdf) to read the data. For non-compressed data, reads seem to be limited by disk speed, so CPU is also not relevant.

Given these considerations, it seems like we should use a lock when reading data into xray with dask. @mrocklin we could just use lock=True with da.from_array, right? If we can find use cases for multi-threaded reads, we could also add an optional lock argument to open_dataset/open_mfdataset.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  91184107
Powered by Datasette · Queries took 0.874ms · About: xarray-datasette