home / github

Menu
  • Search all tables
  • GraphQL API

issue_comments

Table actions
  • GraphQL API for issue_comments

4 rows where author_association = "NONE" and user = 56827 sorted by updated_at descending

✎ View and edit SQL

This data as json, CSV (advanced)

Suggested facets: issue_url, reactions, created_at (date), updated_at (date)

issue 2

  • Backend registration does not match docs, and is no longer specifiable in maturin pyproject toml 3
  • Parallel + multi-threaded reading of NetCDF4 + HDF5: Hidefix! 1

user 1

  • gauteh · 4 ✖

author_association 1

  • NONE · 4 ✖
id html_url issue_url node_id user created_at updated_at ▲ author_association body reactions performed_via_github_app issue
1535931753 https://github.com/pydata/xarray/issues/7816#issuecomment-1535931753 https://api.github.com/repos/pydata/xarray/issues/7816 IC_kwDOAMm_X85bjHVp gauteh 56827 2023-05-05T08:46:42Z 2023-05-05T08:46:42Z NONE

Hi,

I forgot to rebuild the package after removing the BACKEND_... line. With only the line in pyproject.toml it works as it should! My mistake. Thanks for the patience.

Regards, Gaute

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Backend registration does not match docs, and is no longer specifiable in maturin pyproject toml 1695809136
1535716950 https://github.com/pydata/xarray/issues/7816#issuecomment-1535716950 https://api.github.com/repos/pydata/xarray/issues/7816 IC_kwDOAMm_X85biS5W gauteh 56827 2023-05-05T05:29:10Z 2023-05-05T05:29:10Z NONE

Hi,

Yes, I tried that, but I then got the same error as if I kept that line in the old format. I'll do a few tests and post the proper error here.

Gaute

{
    "total_count": 1,
    "+1": 1,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Backend registration does not match docs, and is no longer specifiable in maturin pyproject toml 1695809136
1535432806 https://github.com/pydata/xarray/issues/7816#issuecomment-1535432806 https://api.github.com/repos/pydata/xarray/issues/7816 IC_kwDOAMm_X85bhNhm gauteh 56827 2023-05-04T21:23:31Z 2023-05-04T21:23:31Z NONE

If I do not manually add the backend to the array, but only have this line in https://github.com/gauteh/hidefix/blob/main/pyproject.toml#L29:

[project.entry-points."xarray.backends"] hidefix = "hidefix.xarray:HidefixBackendEntrypoint"

which is only what is supported by pyproject.toml/maturin I get an error where xarray expected a tuple and cannot parse the entrypoint, not just the adderss to the entrypoint - as it used to be (back in January at least).

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Backend registration does not match docs, and is no longer specifiable in maturin pyproject toml 1695809136
1396560033 https://github.com/pydata/xarray/issues/7446#issuecomment-1396560033 https://api.github.com/repos/pydata/xarray/issues/7446 IC_kwDOAMm_X85TPdCh gauteh 56827 2023-01-19T07:44:30Z 2023-01-19T07:44:30Z NONE

On Tue, Jan 17, 2023 at 5:23 PM Ryan Abernathey @.***> wrote:

Hi @gauteh https://github.com/gauteh! This is very cool! Thanks for sharing. I'm really excited about way that Rust can be used to optimized different parts of our stack.

A couple of questions:

-

Can your reader read over HTTP / S3 protocol? Or is it just local files?

It is built to do this, but I haven't implemented it. I initially wrote it for an OpenDAP server (dars: https://github.com/gauteh/dars), where the plan is to also support files stored in the cloud. So the hidefix-reader can read from any interface that supports ReadAt or Read + Seek. It would probably be beneficial to index the files beforehand. I submitted a patch to HDF5 that allows it to iterate over the chunks quickly, so indexing a 5-6 GB file takes only a couple of hundred ms - so I no longer store the index for local files. It is still faster than native HDF5 including the indexing.

- -

Do you know about kerchunk https://fsspec.github.io/kerchunk/? The approach you described:

The reader works by indexing the chunks of a dataset so that chunks can be accessed independently.

...is identical to the approach taken by Kerchunk (although the implementation is different). I'm curious what specification you use to store your indexes. Could we make your implementation interoperable with kerchunk, such that a kerchunk reference specification could be read by your reader? It would be great to reach for some degree of alignment here.

The index is serializable using the rust serde system, so it can be stored in any format supported by that. A fair amount of effort went into making the deserialization zero-copy: that means that I can read the e.g. 10mb index for a 5-6gb file very quickly, but it requires very little deserialization since the read buffers are already memory-mapped to the structures making it very fast. I don't have a specific format at the moment, but I have used bincode a lot in e.g. dars.

- -

Do you know about hdf5-coro - http://icesat2sliderule.org/h5coro/ - they have similar goals, but focused on cloud-based access

I hope this can be of general interest, and if it would be of interest to move the hidefix xarray backend into xarray that would be very cool.

This is definitely of general interest! However, it is not necessary to add a new backend directly into xarray. We support entry points which allow packages to implement their own readers, as you have apparently already discovered: https://docs.xarray.dev/en/stable/internals/how-to-add-new-backend.html

Installing your package should be enough to enable the new engine.

We would, however, welcome a documentation PR that described how to use this package on the I/O page.

Great, the package should already register itself with xarray.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Parallel + multi-threaded reading of NetCDF4 + HDF5: Hidefix! 1536004355

Advanced export

JSON shape: default, array, newline-delimited, object

CSV options:

CREATE TABLE [issue_comments] (
   [html_url] TEXT,
   [issue_url] TEXT,
   [id] INTEGER PRIMARY KEY,
   [node_id] TEXT,
   [user] INTEGER REFERENCES [users]([id]),
   [created_at] TEXT,
   [updated_at] TEXT,
   [author_association] TEXT,
   [body] TEXT,
   [reactions] TEXT,
   [performed_via_github_app] TEXT,
   [issue] INTEGER REFERENCES [issues]([id])
);
CREATE INDEX [idx_issue_comments_issue]
    ON [issue_comments] ([issue]);
CREATE INDEX [idx_issue_comments_user]
    ON [issue_comments] ([user]);
Powered by Datasette · Queries took 16.359ms · About: xarray-datasette