issue_comments: 1017298572
This data as json
html_url | issue_url | id | node_id | user | created_at | updated_at | author_association | body | reactions | performed_via_github_app | issue |
---|---|---|---|---|---|---|---|---|---|---|---|
https://github.com/pydata/xarray/issues/6174#issuecomment-1017298572 | https://api.github.com/repos/pydata/xarray/issues/6174 | 1017298572 | IC_kwDOAMm_X848or6M | 57705593 | 2022-01-20T09:53:16Z | 2022-01-20T09:53:32Z | CONTRIBUTOR | Thanks for your quick response, Tom! I'm sure that DataTree is a really neat solution for most people working with hierarchically structured data. In my case, we are talking about a very unusual application of the NetCDF4 groups feature: We store literally thousands of very small NetCDF datasets in a single file. A file containing 3000 datasets is typically not larger than 100 MB. With that setup, the I/O performance is critical. Opening and closing the file on each group read/write is very, very bad. On our cluster this means that writing that 100 MB file takes 10 hours with your DataTree implementation, and 30 minutes with my helper functions. For reading, the effect is smaller, but still noticeable. So, my request is really about the I/O performance, and I don't need a full-fledged hierarchical data management API in xarray for that. |
{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
1108138101 |