issue_comments: 1017782089

This data as json

html_url	issue_url	id	node_id	user	created_at	updated_at	author_association	body	reactions	performed_via_github_app	issue
https://github.com/pydata/xarray/issues/6174#issuecomment-1017782089	https://api.github.com/repos/pydata/xarray/issues/6174	1017782089	IC_kwDOAMm_X848qh9J	35968931	2022-01-20T18:11:26Z	2022-01-20T18:12:32Z	MEMBER	In my case, we are talking about a very unusual application of the NetCDF4 groups feature: We store literally thousands of very small NetCDF datasets in a single file. A file containing 3000 datasets is typically not larger than 100 MB. Ah - thanks for the clarification as to the context @tovogt ! So, my request is really about the I/O performance, and I don't need a full-fledged hierarchical data management API in xarray for that. That's fair enough. On our cluster this means that writing that 100 MB file takes 10 hours with your DataTree implementation, and 30 minutes with my helper functions. For reading, the effect is smaller, but still noticeable. So are you asking if: a) We should add a function to xarray which uses the same trick your helper functions do, for when people have a similar problem to you? b) We should use the same trick your helper functions do to rewrite the I/O implementation of DataTree to only require one open/close? (It seems to me that this could be the best of both worlds, once implemented.) c) Whether there is some other way to do this even faster than your helper functions? EDIT: Tagging @alexamici / @aurghs for their backends expertise + interest in DataTree	{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }		1108138101