github: issue_comments: 5 rows where author_association = "CONTRIBUTOR" and issue = 1108138101 sorted by updated

5 rows where author_association = "CONTRIBUTOR" and issue = 1108138101 sorted by updated_at descending

Search:

descending

id	html_url	issue_url	node_id	user	created_at	updated_at ▲	author_association	body	reactions	issue
1028730657	https://github.com/pydata/xarray/issues/6174#issuecomment-1028730657	https://api.github.com/repos/pydata/xarray/issues/6174	IC_kwDOAMm_X849US8h	tovogt 57705593	2022-02-03T08:39:45Z	2022-02-03T08:41:16Z	CONTRIBUTOR	Have you seen `xarray.save_mfdataset`? In principle, it was designed for exactly this sort of thing. Thanks for the hint! Unfortunately, it says already in the docstring that "it is no different than calling to_netcdf repeatedly". And I explained in my OP that this would cause repeated file open/close operations - which is the whole point of this issue. Furthermore, when using `save_mfdataset` with my setup, it complains: `ValueError: cannot use mode='w' when writing multiple datasets to the same path` But when using `mode='a'` instead, it will complain that the file doesn't exist. However, it might still be the way to go API-wise. So, when talking about the solution of this issue, we could aim at fixing `save_mfdataset`: 1) Writing to the same file should use a single open/close operation. 2) Support `mode='w'` (or `mode='w+'`) when writing several datasets to the same path.	{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	[FEATURE]: Read from/write to several NetCDF4 groups with a single file open/close operation 1108138101
1019879801	https://github.com/pydata/xarray/issues/6174#issuecomment-1019879801	https://api.github.com/repos/pydata/xarray/issues/6174	IC_kwDOAMm_X848yiF5	tovogt 57705593	2022-01-24T09:16:40Z	2022-01-24T09:16:40Z	CONTRIBUTOR	That's good at least! Do you have any suggestions for where the docs should be improved? PRs are of course always welcome too Here is my PR for the docstring improvements: https://github.com/pydata/xarray/pull/6187	{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	[FEATURE]: Read from/write to several NetCDF4 groups with a single file open/close operation 1108138101
1019849836	https://github.com/pydata/xarray/issues/6174#issuecomment-1019849836	https://api.github.com/repos/pydata/xarray/issues/6174	IC_kwDOAMm_X848yaxs	tovogt 57705593	2022-01-24T08:43:36Z	2022-01-24T08:43:36Z	CONTRIBUTOR	It's not at all tricky to implement the listing of groups in a NETCDF4 file, at least not for the "netcdf4" engine. The code for that is in my OP above: ```python def _xr_nc4_groups_from_store(store): """List all groups contained in the given NetCDF4 data store `Parameters ---------- store : xarray.backend.NetCDF4DataStore Returns ------- list of str """ def iter_groups(ds, prefix=""): groups = [""] for group_name, group_ds in ds.groups.items(): groups.extend([f"{prefix}{group_name}{subgroup}" for subgroup in iter_groups(group_ds, prefix="/")]) return groups with store._manager.acquire_context(False) as root: return iter_groups(root)` ```	{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	[FEATURE]: Read from/write to several NetCDF4 groups with a single file open/close operation 1108138101
1018257806	https://github.com/pydata/xarray/issues/6174#issuecomment-1018257806	https://api.github.com/repos/pydata/xarray/issues/6174	IC_kwDOAMm_X848sWGO	tovogt 57705593	2022-01-21T07:40:55Z	2022-01-21T07:46:06Z	CONTRIBUTOR	When I first posted this issue, I thought, the best solution is to just implement my proposed helper functions as part of the official xarray API. I don't think our project would add DataTree as a new dependency just for this as long as we have a very easy and viable solution of ourselves. But now I have a new idea. At first, I noticed that `open_dataset` won't actually close the file handle, but reuse it later if needed. So, at least there is no performance problem with the current read setup. For writing, there should be an option in `to_netcdf` that ensures that xarray is not closing the file handle. xarray already uses a `CachingFileManager` to open NetCDF4-files: https://github.com/pydata/xarray/blob/0ffb0f42282a1b67c4950e90e1e4ecd146307aa8/xarray/backends/netCDF4_.py#L379-L381 That means, that manager already ensures that the same file handle is re-used in subsequent operations of `to_netcdf` with the same file, unless it's closed in the meantime. Closing is managed here: https://github.com/pydata/xarray/blob/0ffb0f42282a1b67c4950e90e1e4ecd146307aa8/xarray/backends/api.py#L1072-L1094 It's a bit intransparent, when closing is actually triggered in practice - especially if you only look at the current docstrings. I found that, in fact, setting `compute=False` in `to_netcdf` will prevent the closing until you explicitly call compute on the returned object: `python for name, ds in zip(ds_names, ds_list): delayed = ds.to_netcdf(path, group=name, compute=False) delayed.compute()` If this would be communicated more transparently in the docstrings, it would bring us a big step closer to the solution of this issue :slightly_smiling_face: Apart from that, there is only one problem left: Getting a full list of all groups contained in a NetCDF4 file so that we can read them all in. In DataTree, you fall back to using directly the NetCDF4 (or h5netcdf) API for that purpose: `_get_nc_dataset_class` and `_iter_nc_groups`. That's not the worst solution. However, I would insist that xarray should be able to do this. Maybe we need a `open_datasets_from_groups` function for that, or rather a function `list_datasets`. But it should somehow be solvable within the `xarray` API without requiring a two-year debate about the management and representation of hierarchical data structures.	{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	[FEATURE]: Read from/write to several NetCDF4 groups with a single file open/close operation 1108138101
1017298572	https://github.com/pydata/xarray/issues/6174#issuecomment-1017298572	https://api.github.com/repos/pydata/xarray/issues/6174	IC_kwDOAMm_X848or6M	tovogt 57705593	2022-01-20T09:53:16Z	2022-01-20T09:53:32Z	CONTRIBUTOR	Thanks for your quick response, Tom! I'm sure that DataTree is a really neat solution for most people working with hierarchically structured data. In my case, we are talking about a very unusual application of the NetCDF4 groups feature: We store literally thousands of very small NetCDF datasets in a single file. A file containing 3000 datasets is typically not larger than 100 MB. With that setup, the I/O performance is critical. Opening and closing the file on each group read/write is very, very bad. On our cluster this means that writing that 100 MB file takes 10 hours with your DataTree implementation, and 30 minutes with my helper functions. For reading, the effect is smaller, but still noticeable. So, my request is really about the I/O performance, and I don't need a full-fledged hierarchical data management API in xarray for that.	{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	[FEATURE]: Read from/write to several NetCDF4 groups with a single file open/close operation 1108138101

Advanced export

JSON shape: default, array, newline-delimited, object

CREATE TABLE [issue_comments] (
   [html_url] TEXT,
   [issue_url] TEXT,
   [id] INTEGER PRIMARY KEY,
   [node_id] TEXT,
   [user] INTEGER REFERENCES [users]([id]),
   [created_at] TEXT,
   [updated_at] TEXT,
   [author_association] TEXT,
   [body] TEXT,
   [reactions] TEXT,
   [performed_via_github_app] TEXT,
   [issue] INTEGER REFERENCES [issues]([id])
);
CREATE INDEX [idx_issue_comments_issue]
    ON [issue_comments] ([issue]);
CREATE INDEX [idx_issue_comments_user]
    ON [issue_comments] ([user]);