home / github / issues

Menu
  • GraphQL API
  • Search all tables

issues: 179052741

This data as json

id node_id number title user state locked assignee milestone comments created_at updated_at closed_at author_association active_lock_reason draft pull_request body reactions performed_via_github_app state_reason repo type
179052741 MDExOlB1bGxSZXF1ZXN0ODY2MzEwNTE= 1017 WIP: Optional indexes (no more default coordinates given by range(n)) 1217238 closed 0     35 2016-09-24T21:24:39Z 2017-01-16T01:27:34Z 2016-12-15T02:40:35Z MEMBER   0 pydata/xarray/pulls/1017

Fixes #283

Motivation

Currently, when a Dataset or DataArray is created without explicit coordinate labels for a dimension, we insert a coordinate with the values given by range(n).

This is problematic, for two main reasons: 1. There aren't always meaningful dimension labels. For example, an RGB image might represented by a DataArray with three dimensions ('row', 'column', 'channel'). 'row' and 'column' each have fixed size, but only channel has meaningful labels ['red', 'green', 'blue']. 2. Default labels lead to bad default alignment behavior. In the RGB image example, when I combine a 200x200 pixel image with a 300x300 pixel image, xarray would currently align rows and columns into a 200x200 image. This isn't desirable -- you'd rather get an error than use default labels for alignment.

As is, xarray isn't a good fit for users who don't have meaningful coordinate labels over one or more dimensions. So making labels optional would also increase the audience for the project.

Design decisions

In general, I have followed the alignment rules I suggested for pandas in https://github.com/pydata/pandas-design/issues/17, but there are still some xarray specific design decisions to resolve: - [x] ~~How to handle stack(z=['x', 'y']) when one or more of the original dimensions do not have labels. If we don't insert dummy indexes for MultiIndex levels, then we can't unstack properly anymore.~~ Decision: insert dummy dimensions with stack() as necessary. - [x] ~~How to handle missing indexes in .sel, e.g., array.sel(x=0) when x is not in array.coords. In the current version of this PR, this errors, but my current inclination is to pass .sel indexers directly on to .isel, without remapping labels. This has the bonus of preserving the current behavior for indexing.~~ Decision: if a dimension does not have an associated coordinate, indexing along that dimension with .sel or .loc is positional (like .isel). - [x] ~~Should we create dummy/virtual coordinates like range(n) on demand when indexing a dimension without labels? e.g., array.coords['x'] would return a DataArray with values range(n) (importantly, this would not change the original array).~~ Decision: yes, this the maximally backwards compatible thing to do. - [x] ~~What should the new behavior for reindex_like be, if the argument has dimensions of different sizes but no dimension labels? Should we raise an error, or simply ignore these dimensions?~~ Decision: users expect reindex_like to work like align. We will raise an error if dimension sizes do not match. - [x] ~~What does the transition to this behavior look like? Do we simply add it to v0.9.0 with a big warning about backwards compatibility, or do we need to go through some sort of deprecation cycle with the current behavior?~~ Decision: not doing a deprecation cycle, that would be too cumbersome

Examples of new behavior

``` In [1]: import xarray as xr

In [2]: a = xr.DataArray([1, 2, 3], dims='x')

In [3]: b = xr.DataArray([[1, 2], [3, 4], [5, 6]], dims=['x', 'y'], coords={'y': ['a', 'b']})

In [4]: a Out[4]: <xarray.DataArray (x: 3)> array([1, 2, 3])

In [5]: b Out[5]: <xarray.DataArray (x: 3, y: 2)> array([[1, 2], [3, 4], [5, 6]]) Coordinates: * y (y) <U1 'a' 'b'

In [6]: a + b Out[6]: <xarray.DataArray (x: 3, y: 2)> array([[2, 3], [5, 6], [8, 9]]) Coordinates: * y (y) <U1 'a' 'b'

In [7]: c = xr.DataArray([1, 2], dims='x')

In [8]: a + c ValueError: dimension 'x' without indexes cannot be aligned because it has different sizes: {2, 3}

In [9]: d = xr.DataArray([1, 2, 3], coords={'x': [10, 20, 30]}, dims='x')

indexes are copied from the argument with labels if they have the same size

In [10]: a + d Out[10]: <xarray.DataArray (x: 3)> array([2, 4, 6]) Coordinates: * x (x) int64 10 20 30 ```

New doc sections

{
    "url": "https://api.github.com/repos/pydata/xarray/issues/1017/reactions",
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
    13221727 pull

Links from other tables

  • 0 rows from issues_id in issues_labels
  • 35 rows from issue in issue_comments
Powered by Datasette · Queries took 75.937ms · About: xarray-datasette