id,node_id,number,title,user,state,locked,assignee,milestone,comments,created_at,updated_at,closed_at,author_association,active_lock_reason,draft,pull_request,body,reactions,performed_via_github_app,state_reason,repo,type 295959111,MDU6SXNzdWUyOTU5NTkxMTE=,1900,Representing & checking Dataset schemas ,5635139,open,0,,,15,2018-02-09T18:06:08Z,2022-07-14T11:28:37Z,,MEMBER,,,,"What would be the best way to canonically describe a dataset, which could be read by both humans and machines? For example, frequently in our code we have docstrings which look something like: ``` def get_returns(security_ids): """""" Retuns mega-dimensional dataset which gives recent returns for a set of securities by: - Date - Return (raw / economic / smoothed / etc) - Scaling (constant / risk_scaled) - Span - Hedged vs Unhedged Dataset keys are security ids. All dimensions have coords. """""" ``` This helps when attempting to understand what code is doing while only reading it. But this isn't consistent between docstrings and can't be read or checked by a machine. Has anyone solved this problem / have any suggestions for resources out there? Tangentially related to https://github.com/python/typing/issues/513 (but our issues are less about the type, dimension sizes, and more about the arrays within a dataset, their dimensions, and their names)","{""url"": ""https://api.github.com/repos/pydata/xarray/issues/1900/reactions"", ""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,,13221727,issue