html_url,issue_url,id,node_id,user,created_at,updated_at,author_association,body,reactions,performed_via_github_app,issue
https://github.com/pydata/xarray/issues/7446#issuecomment-1396560033,https://api.github.com/repos/pydata/xarray/issues/7446,1396560033,IC_kwDOAMm_X85TPdCh,56827,2023-01-19T07:44:30Z,2023-01-19T07:44:30Z,NONE,"On Tue, Jan 17, 2023 at 5:23 PM Ryan Abernathey ***@***.***>
wrote:

> Hi @gauteh <https://github.com/gauteh>! This is very cool! Thanks for
> sharing. I'm really excited about way that Rust can be used to optimized
> different parts of our stack.
>
> A couple of questions:
>
>    -
>
>    Can your reader read over HTTP / S3 protocol? Or is it just local
>    files?
>
> It is built to do this, but I haven't implemented it. I initially wrote it
for an OpenDAP server (dars: https://github.com/gauteh/dars), where the
plan is to also support files stored in the cloud. So the hidefix-reader
can read from any interface that supports ReadAt or Read + Seek. It would
probably be beneficial to index the files beforehand. I submitted a patch
to HDF5 that allows it to iterate over the chunks quickly, so indexing a
5-6 GB file takes only a couple of hundred ms - so I no longer store the
index for local files. It is still faster than native HDF5 including the
indexing.

>
>    -
>    -
>
>    Do you know about kerchunk <https://fsspec.github.io/kerchunk/>? The
>    approach you described:
>
>    The reader works by indexing the chunks of a dataset so that chunks
>    can be accessed independently.
>
>    ...is identical to the approach taken by Kerchunk (although the
>    implementation is different). I'm curious what specification you use to
>    store your indexes. Could we make your implementation interoperable with
>    kerchunk, such that a kerchunk reference specification could be read by
>    your reader? It would be great to reach for some degree of alignment here.
>
>
The index is serializable using the rust serde system, so it can be stored
in any format supported by that. A fair amount of effort went into making
the deserialization _zero-copy_: that means that I can read the e.g. 10mb
index for a 5-6gb file very quickly, but it requires very little
deserialization since the read buffers are already memory-mapped to the
structures making it very fast. I don't have a specific format at the
moment, but I have used bincode a lot in e.g. dars.

>
>    -
>    -
>
>    Do you know about hdf5-coro - http://icesat2sliderule.org/h5coro/ -
>    they have similar goals, but focused on cloud-based access
>
> I hope this can be of general interest, and if it would be of interest to
> move the hidefix xarray backend into xarray that would be very cool.
>
> This is definitely of general interest! However, it is not necessary to
> add a new backend directly into xarray. We support entry points which allow
> packages to implement their own readers, as you have apparently already
> discovered:
> https://docs.xarray.dev/en/stable/internals/how-to-add-new-backend.html
>
> Installing your package should be enough to enable the new engine.
>
> We would, however, welcome a documentation PR that described how to use
> this package on the I/O page.
>

Great, the package should already register itself with xarray.
","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,1536004355