id,node_id,number,title,user,state,locked,assignee,milestone,comments,created_at,updated_at,closed_at,author_association,active_lock_reason,draft,pull_request,body,reactions,performed_via_github_app,state_reason,repo,type
1536004355,I_kwDOAMm_X85bjZED,7446,Parallel + multi-threaded reading of NetCDF4 + HDF5: Hidefix!,56827,open,0,,,9,2023-01-17T08:56:03Z,2023-06-26T22:06:46Z,,NONE,,,,"### What is your issue?

Greetings,

I have developed a parallel or multi-threaded (and even async) reader for HDF5 and NetCDF4 files. It is still at a somewhat experimental stage (and does not support all compressions etc), but has been tested a fair bit by now. The reader is written in Rust with Python bindings:

https://github.com/gauteh/hidefix (pending conda package: https://github.com/conda-forge/staged-recipes/pull/21742)

Regular NetCDF4 and HDF5 is _not_ thread-safe, and there's a global _process-wide_ lock for reading files. With hidefix this lock is removed. This would allow parallel reading of datasets to be done in the same process, as opposed to split across processes. Additionally, the reader can read directly into the target buffer and thus avoids a cache for decoded chunks (effectively reducing memory usage and chunk re-decoding). 

The reader works by indexing the chunks of a dataset so that chunks can be accessed independently.

I have created a basic xarray backend, combined with the NetCDF4 backend for reading attributes etc: https://github.com/gauteh/hidefix/blob/main/python/hidefix/xarray.py and it works pretty well for reading:

![py_hidefix_bench](https://user-images.githubusercontent.com/56827/212849232-73f429bd-e165-4b4f-80e5-873ca9fbbfd9.png)

on my laptop with 8 CPUs we get **_6x_** speed-up over the xarray NetCDF4 backend (reading a 380mb variable)! On larger machines the speed-up is even greater (if you want to control the number of CPUs set the [`RAYON_NUM_THREADS`](https://github.com/rayon-rs/rayon/blob/master/FAQ.md) env variable).

Running benchmarks along the lines of:

```
import xarray as xr

i = xr.open_dataset('tests/data/barents_zdepth_m00_FC.nc', engine='hidefix')
d = i['v']
v = d[...].values
print(v.shape, type(v))
```

for the different backends (with or without xarray):

![Screenshot from 2023-01-17 09-48-44](https://user-images.githubusercontent.com/56827/212851587-f4249af7-0a15-4059-8fa0-e21ba9c46084.png)

At this point it turns out that a significant point of time was spent setting the `_FillValue` for the returned array (less important for NetCDF4 since the reader took much longer time anyway), this could also be done in rust in parallel: https://github.com/gauteh/hidefix/blob/main/src/python/mod.rs#L128 . Reducing it to a negligible amount of time. This can also be used on the existing xarray NetCDF4 backend.

I hope this can be of general interest,  and if it would be of interest to move the hidefix xarray backend into xarray that would be very cool. 

Best regards, Gaute","{""url"": ""https://api.github.com/repos/pydata/xarray/issues/7446/reactions"", ""total_count"": 10, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 5, ""rocket"": 4, ""eyes"": 1}",,,13221727,issue