home / github / issue_comments

Menu
  • GraphQL API
  • Search all tables

issue_comments: 116182511

This data as json

html_url issue_url id node_id user created_at updated_at author_association body reactions performed_via_github_app issue
https://github.com/pydata/xarray/issues/444#issuecomment-116182511 https://api.github.com/repos/pydata/xarray/issues/444 116182511 MDEyOklzc3VlQ29tbWVudDExNjE4MjUxMQ== 306380 2015-06-28T01:55:39Z 2015-06-28T01:55:39Z MEMBER

Oh, I didn't realize that that was built in already. Sounds like you could handle this easily on the xray side. On Jun 27, 2015 4:40 PM, "Stephan Hoyer" notifications@github.com wrote:

Of course, concurrent access to HDF5 files works fine on my laptop, using Anaconda's build of HDF5 (version 1.8.14). I have no idea what special flags they invoked when building it :).

That said, I have been unable to produce any benchmarks that show improved performance when simply doing multithreaded reads without doing any computation (e.g., %time xray.open_dataset(..., chunks=...).load()). Even when I'm reading multiple independent chunks compressed on disk, CPU seems to be pegged at 100%, when using either netCDF4-python or h5py (via h5netcdf) to read the data. For non-compressed data, reads seem to be limited by disk speed, so CPU is also not relevant.

Given these considerations, it seems like we should use a lock when reading data into xray with dask. @mrocklin https://github.com/mrocklin we could just use lock=True with da.from_array, right? If we can find use cases for multi-threaded reads, we could also add an optional lock argument to open_dataset/open_mfdataset.

— Reply to this email directly or view it on GitHub https://github.com/xray/xray/issues/444#issuecomment-116165986.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  91184107
Powered by Datasette · Queries took 0.618ms · About: xarray-datasette