issue_comments
11 rows where author_association = "MEMBER" and issue = 187859705 sorted by updated_at descending
This data as json, CSV (advanced)
Suggested facets: reactions, created_at (date), updated_at (date)
issue 1
- Dataset groups · 11 ✖
id | html_url | issue_url | node_id | user | created_at | updated_at ▲ | author_association | body | reactions | performed_via_github_app | issue |
---|---|---|---|---|---|---|---|---|---|---|---|
873227866 | https://github.com/pydata/xarray/issues/1092#issuecomment-873227866 | https://api.github.com/repos/pydata/xarray/issues/1092 | MDEyOklzc3VlQ29tbWVudDg3MzIyNzg2Ng== | shoyer 1217238 | 2021-07-02T19:56:49Z | 2021-07-02T19:56:49Z | MEMBER | There's a parallel discussion hierarchical storage going on over in https://github.com/pydata/xarray/issues/4118. I'm going to close this issue in favor of the other one just to keep the ongoing discussion in one place. |
{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
Dataset groups 187859705 | |
290256766 | https://github.com/pydata/xarray/issues/1092#issuecomment-290256766 | https://api.github.com/repos/pydata/xarray/issues/1092 | MDEyOklzc3VlQ29tbWVudDI5MDI1Njc2Ng== | benbovy 4160723 | 2017-03-29T23:26:50Z | 2017-03-29T23:26:50Z | MEMBER |
I would be +1 for allowing tuples for data variables names but not for dimensions/coordinates names. It indeed looks like that using tuples for the latter would be a greater source of confusion and would add too much complexity for only little (or no real?) benefit. I'd be fine with raising an error when loading a netCDF4 file which have groups with conflicting dimensions or when assigning an incompatible Dataset as a new group (e.g., For groups that share common dimensions/coordinates with some differences, a data structure built on top of |
{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
Dataset groups 187859705 | |
290250866 | https://github.com/pydata/xarray/issues/1092#issuecomment-290250866 | https://api.github.com/repos/pydata/xarray/issues/1092 | MDEyOklzc3VlQ29tbWVudDI5MDI1MDg2Ng== | shoyer 1217238 | 2017-03-29T22:54:17Z | 2017-03-29T22:54:17Z | MEMBER |
Yes, this is correct. But note that
Yes, this is true. We would possibly want to make another Dataset subclass for the sub-datasets to ensure that their variables are linked to the parent, e.g., But I'm also not convinced this is actually worth the trouble given how easy it is to write NumPy has similar issues, e.g., |
{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
Dataset groups 187859705 | |
290165548 | https://github.com/pydata/xarray/issues/1092#issuecomment-290165548 | https://api.github.com/repos/pydata/xarray/issues/1092 | MDEyOklzc3VlQ29tbWVudDI5MDE2NTU0OA== | shoyer 1217238 | 2017-03-29T17:38:03Z | 2017-03-29T17:38:49Z | MEMBER |
Yes, totally agreed, and I've encountered similar cases in my own work. These sort of "ragged" arrays are great use case for groups.
Yes, it's a little confusing because it looks similar to
Yes, it would create a new dataset, which could take ~1 ms. That's slow for inner loops (though we could add caching to help), but plenty fast for interactive use. |
{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
Dataset groups 187859705 | |
290142369 | https://github.com/pydata/xarray/issues/1092#issuecomment-290142369 | https://api.github.com/repos/pydata/xarray/issues/1092 | MDEyOklzc3VlQ29tbWVudDI5MDE0MjM2OQ== | shoyer 1217238 | 2017-03-29T16:18:12Z | 2017-03-29T16:18:12Z | MEMBER |
With netCDF4, we could potentially just use groups. Or we could use some sort of naming convention for strings, e.g., joining together together the parts of the tuple with One challenge here is that unless we also let dimensions be group specific, not every netCDF4 file with groups corresponds to a valid xarray Dataset: you can have conflicting sizes on dimensions for netCDF4 files in different groups. In principle, it could be OK to use tuples for dimension names, but we already have lots of logic that distinguishes between single and multiple dimensions by looking for non-strings or tuples. So you would probably have to write How to handle dimensions and coordinate names when assigning groups is clearly one of the important design decisions here. It's obvious that data variables should be grouped but less clear how to handle dimensions/coordinates.
Some sort of further indentation seems natural, possibly with truncation like This is another case where an HTML repr could be powerful, allowing for clearer visual links and potentially interactive expanding/contracting of the tree.
From xarray's perspective, there isn't really a distinction between multiple files and groups in one netCDF file -- it's just a matter of creating a Dataset with data organized in a different way. Presumably we could write helper methods for converting a dimension into a group level (and vice-versa). But it's worth noting that there still limitations to opening large numbers of files in a single dataset, even with groups, because xarray reads all the metadata for every variable into memory at once, and that metadata is copied in every xarray operation. For this reason, you will still probably want a different datastructure (convertible into an xarray.Dataset) when navigating very large datasets like CMIP, which consists of many thousands of files. |
{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
Dataset groups 187859705 | |
290130241 | https://github.com/pydata/xarray/issues/1092#issuecomment-290130241 | https://api.github.com/repos/pydata/xarray/issues/1092 | MDEyOklzc3VlQ29tbWVudDI5MDEzMDI0MQ== | benbovy 4160723 | 2017-03-29T15:38:46Z | 2017-03-29T15:38:46Z | MEMBER | @darothen you might be interested by the discussion we had here, although it doesn't solve anything related to selection across similar Dataset objects. I think that the collection of Both approaches may co-exist, though. I can imagine the case where we have (1) a set of, e.g., grid-search or monte-carlo model runs and (2) for each model run we have diagnostic variables defined in different places on the grid (e.g., nodes, edges...). The tuple-defined groups within a Dataset is useful for 2 and the collection of Dataset objects is useful for 1. As pointed out by @shoyer, such a collection of Dataset objects might be (preferably) implemented outside of xarray. |
{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
Dataset groups 187859705 | |
290065632 | https://github.com/pydata/xarray/issues/1092#issuecomment-290065632 | https://api.github.com/repos/pydata/xarray/issues/1092 | MDEyOklzc3VlQ29tbWVudDI5MDA2NTYzMg== | benbovy 4160723 | 2017-03-29T11:48:12Z | 2017-03-29T11:48:12Z | MEMBER | Just want to say that I'm very enthusiastic about this! Like @lamorton, I also find myself having a lot of variables with names containing the name(s) of their "group(s)". My initial idea was also to keep flat datasets and add some logic to get/set groups, but it wasn't very clear and well explained.
Makes perfect sense! I also find the idea of using tuples very clever! @shoyer do you have an idea on how it would work with serialization to netCDF? We would also have to decide how to display groups in the repr of the flat dataset... @lamorton @shoyer unless you want to open a PR, I'd be willing to start working on this. |
{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
Dataset groups 187859705 | |
289923078 | https://github.com/pydata/xarray/issues/1092#issuecomment-289923078 | https://api.github.com/repos/pydata/xarray/issues/1092 | MDEyOklzc3VlQ29tbWVudDI4OTkyMzA3OA== | shoyer 1217238 | 2017-03-28T22:22:31Z | 2017-03-28T22:24:31Z | MEMBER | @lamorton Thanks for explaining the use case here. This makes more sense to me now. I like your idea of groups as syntactic sugar around flat datasets with named keys. With an appropriate naming convention, we might even be able to put this into
|
{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
Dataset groups 187859705 | |
259390660 | https://github.com/pydata/xarray/issues/1092#issuecomment-259390660 | https://api.github.com/repos/pydata/xarray/issues/1092 | MDEyOklzc3VlQ29tbWVudDI1OTM5MDY2MA== | benbovy 4160723 | 2016-11-09T11:15:01Z | 2016-11-09T11:24:51Z | MEMBER |
Yep once again I haven't thought about all the implications this would have! This would indeed add much complexity at the end. I'll try to follow you suggestion of building another data structure, for example - correct me if it's a wrong approach too - a |
{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
Dataset groups 187859705 | |
259208431 | https://github.com/pydata/xarray/issues/1092#issuecomment-259208431 | https://api.github.com/repos/pydata/xarray/issues/1092 | MDEyOklzc3VlQ29tbWVudDI1OTIwODQzMQ== | rabernat 1197350 | 2016-11-08T17:51:00Z | 2016-11-08T17:51:00Z | MEMBER | This suggestion has some significant overlap with the data store / data discovery discussion from last weekend: https://aospy.hackpad.com/Data-StorageDiscovery-Design-Document-fM6LgfwrJ2K |
{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
Dataset groups 187859705 | |
259206339 | https://github.com/pydata/xarray/issues/1092#issuecomment-259206339 | https://api.github.com/repos/pydata/xarray/issues/1092 | MDEyOklzc3VlQ29tbWVudDI1OTIwNjMzOQ== | shoyer 1217238 | 2016-11-08T17:43:22Z | 2016-11-08T17:43:22Z | MEMBER | I am reluctant to add the additional complexity of groups directly into the I would rather see this living in another data structure built on top of |
{ "total_count": 2, "+1": 2, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
Dataset groups 187859705 |
Advanced export
JSON shape: default, array, newline-delimited, object
CREATE TABLE [issue_comments] ( [html_url] TEXT, [issue_url] TEXT, [id] INTEGER PRIMARY KEY, [node_id] TEXT, [user] INTEGER REFERENCES [users]([id]), [created_at] TEXT, [updated_at] TEXT, [author_association] TEXT, [body] TEXT, [reactions] TEXT, [performed_via_github_app] TEXT, [issue] INTEGER REFERENCES [issues]([id]) ); CREATE INDEX [idx_issue_comments_issue] ON [issue_comments] ([issue]); CREATE INDEX [idx_issue_comments_user] ON [issue_comments] ([user]);
user 3