id,node_id,number,title,user,state,locked,assignee,milestone,comments,created_at,updated_at,closed_at,author_association,active_lock_reason,draft,pull_request,body,reactions,performed_via_github_app,state_reason,repo,type 179052741,MDExOlB1bGxSZXF1ZXN0ODY2MzEwNTE=,1017,WIP: Optional indexes (no more default coordinates given by range(n)),1217238,closed,0,,,35,2016-09-24T21:24:39Z,2017-01-16T01:27:34Z,2016-12-15T02:40:35Z,MEMBER,,0,pydata/xarray/pulls/1017,"Fixes #283 ## Motivation Currently, when a Dataset or DataArray is created without explicit coordinate labels for a dimension, we insert a coordinate with the values given by range(n). This is problematic, for two main reasons: 1. There aren't always meaningful dimension labels. For example, an RGB image might represented by a DataArray with three dimensions ('row', 'column', 'channel'). 'row' and 'column' each have fixed size, but only channel has meaningful labels ['red', 'green', 'blue']. 2. Default labels lead to bad default alignment behavior. In the RGB image example, when I combine a 200x200 pixel image with a 300x300 pixel image, xarray would currently align rows and columns into a 200x200 image. This isn't desirable -- you'd rather get an error than use default labels for alignment. As is, xarray isn't a good fit for users who don't have meaningful coordinate labels over one or more dimensions. So making labels optional would also increase the audience for the project. ## Design decisions In general, I have followed the alignment rules I suggested for pandas in https://github.com/pydata/pandas-design/issues/17, but there are still some xarray specific design decisions to resolve: - [x] ~~How to handle `stack(z=['x', 'y'])` when one or more of the original dimensions do not have labels. If we don't insert dummy indexes for MultiIndex levels, then we can't unstack properly anymore.~~ Decision: insert dummy dimensions with `stack()` as necessary. - [x] ~~How to handle missing indexes in `.sel`, e.g., `array.sel(x=0)` when `x` is not in `array.coords`. In the current version of this PR, this errors, but my current inclination is to pass `.sel` indexers directly on to `.isel`, without remapping labels. This has the bonus of preserving the current behavior for indexing.~~ Decision: if a dimension does not have an associated coordinate, indexing along that dimension with `.sel` or `.loc` is positional (like `.isel`). - [x] ~~Should we create dummy/virtual coordinates like `range(n)` on demand when indexing a dimension without labels? e.g., `array.coords['x']` would return a DataArray with values `range(n)` (importantly, this would not change the original `array`).~~ Decision: yes, this the maximally backwards compatible thing to do. - [x] ~~What should the new behavior for `reindex_like` be, if the argument has dimensions of different sizes but no dimension labels? Should we raise an error, or simply ignore these dimensions?~~ Decision: users expect `reindex_like` to work like `align`. We will raise an error if dimension sizes do not match. - [x] ~~What does the transition to this behavior look like? Do we simply add it to v0.9.0 with a big warning about backwards compatibility, or do we need to go through some sort of deprecation cycle with the current behavior?~~ Decision: not doing a deprecation cycle, that would be too cumbersome ## Examples of new behavior ``` In [1]: import xarray as xr In [2]: a = xr.DataArray([1, 2, 3], dims='x') In [3]: b = xr.DataArray([[1, 2], [3, 4], [5, 6]], dims=['x', 'y'], coords={'y': ['a', 'b']}) In [4]: a Out[4]: array([1, 2, 3]) In [5]: b Out[5]: array([[1, 2], [3, 4], [5, 6]]) Coordinates: * y (y) array([[2, 3], [5, 6], [8, 9]]) Coordinates: * y (y) array([2, 4, 6]) Coordinates: * x (x) int64 10 20 30 ``` ## New doc sections ","{""url"": ""https://api.github.com/repos/pydata/xarray/issues/1017/reactions"", ""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,,13221727,pull