home / github / issue_comments

Menu
  • GraphQL API
  • Search all tables

issue_comments: 1017298572

This data as json

html_url issue_url id node_id user created_at updated_at author_association body reactions performed_via_github_app issue
https://github.com/pydata/xarray/issues/6174#issuecomment-1017298572 https://api.github.com/repos/pydata/xarray/issues/6174 1017298572 IC_kwDOAMm_X848or6M 57705593 2022-01-20T09:53:16Z 2022-01-20T09:53:32Z CONTRIBUTOR

Thanks for your quick response, Tom!

I'm sure that DataTree is a really neat solution for most people working with hierarchically structured data.

In my case, we are talking about a very unusual application of the NetCDF4 groups feature: We store literally thousands of very small NetCDF datasets in a single file. A file containing 3000 datasets is typically not larger than 100 MB.

With that setup, the I/O performance is critical. Opening and closing the file on each group read/write is very, very bad. On our cluster this means that writing that 100 MB file takes 10 hours with your DataTree implementation, and 30 minutes with my helper functions. For reading, the effect is smaller, but still noticeable.

So, my request is really about the I/O performance, and I don't need a full-fledged hierarchical data management API in xarray for that.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  1108138101
Powered by Datasette · Queries took 0.599ms · About: xarray-datasette