home / github / issues

Menu
  • Search all tables
  • GraphQL API

issues: 32932632

This data as json

id node_id number title user state locked assignee milestone comments created_at updated_at closed_at author_association active_lock_reason draft pull_request body reactions performed_via_github_app state_reason repo type
32932632 MDU6SXNzdWUzMjkzMjYzMg== 117 Restructure DataArray internals to not use a Dataset? 1217238 closed 0     1 2014-05-06T21:02:26Z 2014-09-02T03:46:15Z 2014-09-02T03:46:15Z MEMBER      

It would be nice to allow DataArray objects without named dimensions (#116). But it doesn't make much sense to put arrays without named dimensions into a Dataset.

This suggests that we should change the current model for the internals of DataArray, which currently works by applying operations to an internal Dataset, and keeping track of the name of the name of the array of interest.

An alternate representation would use a fixed size list-like attribute coordinates to keep track of coordinates. Putting a DataArray without named dimensions into a Dataset will raise an error.

Positives: 1. This is a more transparent and obvious model for directly working with DataArray objects. 2. It will simplify making DataArrays without named dimensions. 3. It will make choices like when to drop other dataset variables in an data array operation more obvious: other variables will always be dropped, because we won't bother keeping track of a dataset anymore. 4. Related to my bullet 1, this will have positive performance implications for array indexing, since it will more obvious exactly which arrays you are indexing (currently indexing indexes every array in a dataset).

Negatives: 1. This will certainly add lines of code and complexity. Making an operation work for both Datasets and DataArrays will no longer be quite so simple. 2. It will no longer be as straightforward to access other related variables in a DataArray. In particular, it won't work to do ds['foo'].groupby('bar') if "bar" is not a dimension in ds['foo'], unless we keep around some sort of reference to the dataset in the array. Perhaps this tradeoff is worth it: ds['foo'].groupby(ds['bar']) isn't so terrible.

CC @mrocklin, I mentioned this up briefly in the context of #116 during PyData.

{
    "url": "https://api.github.com/repos/pydata/xarray/issues/117/reactions",
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  completed 13221727 issue

Links from other tables

  • 1 row from issues_id in issues_labels
  • 1 row from issue in issue_comments
Powered by Datasette · Queries took 0.764ms · About: xarray-datasette