id,node_id,number,title,user,state,locked,assignee,milestone,comments,created_at,updated_at,closed_at,author_association,active_lock_reason,draft,pull_request,body,reactions,performed_via_github_app,state_reason,repo,type 39264845,MDU6SXNzdWUzOTI2NDg0NQ==,197,We need some way to identify non-index coordinates,1217238,closed,0,,740776,3,2014-08-01T06:36:13Z,2014-12-19T07:16:14Z,2014-09-10T06:07:15Z,MEMBER,,,,"I am currently working with station data. In order to keep around latitude and longitude (I use station_id as the coordinate variable), I need to resort to some ridiculous contortions: ``` python residuals = results['y'] - observations['y'] residuals.dataset.update(results.select_vars('longitude', 'latitude')) ``` There has got to be an easier way to handle this. --- I don't want to revert to some primitive guessing strategy (e.g, looking at `attrs['coordinates']`) to figure out which extra variables can be safely kept after mathematical operations. Another approach would be to try to preserve _everything_ in the dataset linked to an DataArray when doing math. But I don't really like this option, either, because it would lead to serious propagation of ""linked dataset variables"", which are rather surprising and can have unexpected performance consequences (though at least they appear in repr as of #128). --- This leaves me to a final alternative: restructuring xray's internals to provide first-class support for coordinates that are not indexes. For example, this would mean promoting `ds.coordinates` to an actual dictionary stored on a dataset, and allowing it to hold objects that aren't an `xray.Coordinate`. Making this change transparent to users would likely require changing the `Dataset` signature to something like `Dataset(variables, coords, attrs)`. We might (yet again) want to rename `Coordinate`, to something like `IndexVar`, to emphasis the notion of ""index"" and ""non-index"" coordinates. And we could get rid of the terrible ""linked dataset variable"". Once we have non-index coordinates, we need a policy for what to do when adding with two DataArrays for which they differ. I think my preferred approach is to not enforce that they be found on both arrays, but to raise an exception if there are any conflicting values -- unless they are scalar valued, in which case the dropped or turned into a tuple or given different names. (Otherwise there would be cases where you couldn't calculate `x[1] - x[0]`.) We might even able to keep around multi-dimension coordinates this way (e.g., 2D lat/lon arrays for projected data).... I'll need to think about that one some more. ","{""url"": ""https://api.github.com/repos/pydata/xarray/issues/197/reactions"", ""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,completed,13221727,issue 43098072,MDU6SXNzdWU0MzA5ODA3Mg==,234,"remove the notion of ""Index coordinates"", especially from the Dataset repr?",1217238,closed,0,,740776,1,2014-09-18T06:16:38Z,2014-09-22T02:17:31Z,2014-09-22T02:17:31Z,MEMBER,,,,"@perrette mentioned that he found the distinction between ""index"" and ""other"" coordinates in the dev version of xray confusing (see the [dev build of the docs](http://xray.readthedocs.org/en/latest/data-structures.html#coordinates)). I agree -- the differences are subtle, and difficult to convey. On the whole, they're mostly both just ""Coordinates"", although coordinates with the same name as a dimension are special because they're also used like indexes. So I would like to visit `repr(Dataset)` to make this less confusing. Here are 5 options: 1. Current implementation (on master): ``` Dimensions: (time: 3, x: 2, y: 2) Index Coordinates: time (time) datetime64[ns] 2014-09-06 2014-09-07 2014-09-08 x (x) int64 0 1 y (y) int64 0 1 Other Coordinates: lat (x, y) float64 42.25 42.21 42.63 42.59 lon (x, y) float64 -99.83 -99.32 -99.79 -99.23 reference_time datetime64[ns] 2014-09-05 Variables: temp (x, y, time) float64 11.04 23.57 20.77 9.346 6.683 17.17 11.6 19.54 ... precip (x, y, time) float64 5.904 2.453 3.404 9.847 9.195 0.3777 8.615 7.536 ... ``` 2. Switch ""Index Coordinates"" to ""Coordinates/Indexes"" (to emphasize ""Coordinates"") ``` Dimensions: (time: 3, x: 2, y: 2) Coordinates/Indexes: time (time) datetime64[ns] 2014-09-06 2014-09-07 2014-09-08 x (x) int64 0 1 y (y) int64 0 1 Coordinates/Other: lat (x, y) float64 42.25 42.21 42.63 42.59 lon (x, y) float64 -99.83 -99.32 -99.79 -99.23 reference_time datetime64[ns] 2014-09-05 Variables: temp (x, y, time) float64 11.04 23.57 20.77 9.346 6.683 17.17 11.6 19.54 ... precip (x, y, time) float64 5.904 2.453 3.404 9.847 9.195 0.3777 8.615 7.536 ... ``` 3. Rename ""Other Coordinates"" to ""Non-index Coordinates"": ``` Dimensions: (time: 3, x: 2, y: 2) Index Coordinates: time (time) datetime64[ns] 2014-09-06 2014-09-07 2014-09-08 x (x) int64 0 1 y (y) int64 0 1 Non-index Coordinates: lat (x, y) float64 42.25 42.21 42.63 42.59 lon (x, y) float64 -99.83 -99.32 -99.79 -99.23 reference_time datetime64[ns] 2014-09-05 Variables: temp (x, y, time) float64 11.04 23.57 20.77 9.346 6.683 17.17 11.6 19.54 ... precip (x, y, time) float64 5.904 2.453 3.404 9.847 9.195 0.3777 8.615 7.536 ... ``` 4. Consolidate ""Index"" and ""Other"" coordinates (the info about indexing is implicit in the dimension names): ``` Dimensions: (time: 3, x: 2, y: 2) Coordinates: time (time) datetime64[ns] 2014-09-06 2014-09-07 2014-09-08 x (x) int64 0 1 y (y) int64 0 1 lat (x, y) float64 42.25 42.21 42.63 42.59 lon (x, y) float64 -99.83 -99.32 -99.79 -99.23 reference_time datetime64[ns] 2014-09-05 Variables: temp (x, y, time) float64 11.04 23.57 20.77 9.346 6.683 17.17 11.6 19.54 ... precip (x, y, time) float64 5.904 2.453 3.404 9.847 9.195 0.3777 8.615 7.536 ... ``` 5. Consolidate coordinates, but mark indexes with `*` (indexes could still be all grouped at the top, but wouldn't need to be): ``` Dimensions: (time: 3, x: 2, y: 2) Coordinates: * time (time) datetime64[ns] 2014-09-06 2014-09-07 2014-09-08 * x (x) int64 0 1 * y (y) int64 0 1 lat (x, y) float64 42.25 42.21 42.63 42.59 lon (x, y) float64 -99.83 -99.32 -99.79 -99.23 reference_time datetime64[ns] 2014-09-05 Variables: temp (x, y, time) float64 11.04 23.57 20.77 9.346 6.683 17.17 11.6 19.54 ... precip (x, y, time) float64 5.904 2.453 3.404 9.847 9.195 0.3777 8.615 7.536 ... ``` I am leaning towards option (5). It introduces less terminology and is easier to scan / count at a glance than separate categories of coordinates. The asterisk is still there as a reminder that these coordinates are special, and the distinctions will be highlighted under ""Coordinates"" in the docs for anyone who wants more details. @ToddSmall @akleeman @jhamman Any opinions? (by the way, it's worth checking out @perrette's [dimarray](https://github.com/perrette/dimarray) project... lots of nice ideas and overlap with xray) ","{""url"": ""https://api.github.com/repos/pydata/xarray/issues/234/reactions"", ""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,completed,13221727,issue 40225000,MDU6SXNzdWU0MDIyNTAwMA==,212,"Get ride of ""noncoordinates"" as a name?",1217238,closed,0,,740776,8,2014-08-14T05:52:30Z,2014-09-22T00:55:22Z,2014-09-22T00:55:22Z,MEMBER,,,,"As @ToddSmall has pointed out (in #202), ""noncoordinates"" is a confusing name -- it's something defined by what it isn't, not what it is. Unfortunately, our best alternative is ""variables"", which already has a lot of meaning from the netCDF world (and which we already use). Related: #211 ","{""url"": ""https://api.github.com/repos/pydata/xarray/issues/212/reactions"", ""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,completed,13221727,issue 39385095,MDU6SXNzdWUzOTM4NTA5NQ==,203,"Support mathematical operators (+-*/, etc) for GroupBy objects",1217238,closed,0,,740776,0,2014-08-04T01:40:11Z,2014-09-12T01:18:04Z,2014-09-12T01:18:04Z,MEMBER,,,,"Building on #200, we could add support for mathematical operations to GroupBy objects. Math with groupby objects should automatically ""broadcast"" across group labels, so we can write something like: ``` climatology = ds.groupby('time.month').mean('time') anomalies = ds.groupby('time.month') - climatology ``` ","{""url"": ""https://api.github.com/repos/pydata/xarray/issues/203/reactions"", ""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,completed,13221727,issue 36625519,MDU6SXNzdWUzNjYyNTUxOQ==,176,Proposal: Dataset transpose or order_dims method,1217238,closed,0,,740776,0,2014-06-26T23:49:43Z,2014-09-07T04:18:06Z,2014-09-07T04:18:06Z,MEMBER,,,,"It should transpose all variables so that they have dimensions in the same given order, ignoring any dimensions that are not used by variable. E.g., `ds.transpose('x', 'y', 'z')` should give me a dataset with all data in the same order. ","{""url"": ""https://api.github.com/repos/pydata/xarray/issues/176/reactions"", ""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,completed,13221727,issue 39919261,MDU6SXNzdWUzOTkxOTI2MQ==,211,Should iterating over a Dataset include coordinates?,1217238,closed,0,,740776,0,2014-08-10T23:13:47Z,2014-09-05T03:16:53Z,2014-09-05T03:16:53Z,MEMBER,,,,"My inclination is **no**: the contents of a Dataset (e.g., `list(ds)`, `ds.keys()` and `ds.values()`) should only include non-coordinates. `__contains__` checks for a coordinate (e.g., `'time'`) would need to look in `ds.dimensions` or `ds.coordinates` instead of `ds`, but I see no need to `__getitem__`: `ds['time']` can still work. Pluses: 1. This change would more closely align `xray.Dataset` with `pandas.DataFrame`, which also does not include any elements of the index in the contents of the frame. 2. It would eliminate the need for using `ds.noncoordinates` -- which, as @ToddSmall has pointed out, is not very intuitive. 3. In my experience, I have been using `ds.noncoordinates.items()` more often than `ds.items()` (which contains redundant information, as coordinates are repeated). The only time I really want to iterate over all variables in a dataset is when I'm using the lower level `Variable` API. Negatives: 1. This would break the existing API. ","{""url"": ""https://api.github.com/repos/pydata/xarray/issues/211/reactions"", ""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,completed,13221727,issue 39875825,MDU6SXNzdWUzOTg3NTgyNQ==,208,Don't require variable dimensions in Dataset.__init__ for scalar or 1d arrays,1217238,closed,0,,740776,1,2014-08-09T01:55:45Z,2014-09-03T18:17:12Z,2014-09-03T18:17:12Z,MEMBER,,,,"The coerce to variable logic should only be performed if the argument is a tuple. For scalars, there is no ambiguity since their dimensions are empty. For 1-d arrays, we should default to creating a new coordinate variable. e.g., I should be able to write `xray.Dataset({'x': np.arange(10), 'y': 0})` ","{""url"": ""https://api.github.com/repos/pydata/xarray/issues/208/reactions"", ""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,completed,13221727,issue 40760695,MDU6SXNzdWU0MDc2MDY5NQ==,218,Support apply with DatasetGroupby returning a DataArray (and vice-versa),1217238,closed,0,,740776,0,2014-08-21T00:28:57Z,2014-09-03T05:24:26Z,2014-09-03T05:24:26Z,MEMBER,,,,"e.g., I should be able to write: ``` dataset.groupby('state').apply(lambda ds: (ds['tmin'] > ds['tmax']).mean('station'))) ``` This will be very simple once we write a generic `xray.concat` function which can handle either type of argument. ","{""url"": ""https://api.github.com/repos/pydata/xarray/issues/218/reactions"", ""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,completed,13221727,issue 40536963,MDU6SXNzdWU0MDUzNjk2Mw==,217,Strings are truncated when concatenating Datasets.,2002703,closed,0,,740776,0,2014-08-18T21:58:36Z,2014-08-21T05:17:28Z,2014-08-21T05:17:28Z,CONTRIBUTOR,,,,"When concatenating Datasets, a variable's string length is limited to the length in the first of the Datasets being concatenated. ``` >>> import xray >>> first = xray.Dataset({'animal': ('animal', ['horse'])}) >>> second = xray.Dataset( {'animal': ('animal', ['aardvark_0'])}) >>> xray.Dataset.concat([first, second], dimension='animal')['animal'] array(['horse', 'aardv'], dtype='|S5') Coordinates: animal: Index([u'horse', u'aardv'], dtype='object') Attributes: Empty ``` (Note the `|S5` dtype and the truncated `aardv`) I think this is the offending line: https://github.com/xray/xray/blob/master/xray/core/variable.py#L623 May want to use `dtype=object` for strings to avoid this issue. ","{""url"": ""https://api.github.com/repos/pydata/xarray/issues/217/reactions"", ""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,completed,13221727,issue 32926274,MDU6SXNzdWUzMjkyNjI3NA==,114,Fix circular imports,1217238,closed,0,,740776,0,2014-05-06T19:47:12Z,2014-08-17T00:52:38Z,2014-08-17T00:52:38Z,MEMBER,,,,"Thanks @takluyver for pointing this out in #113. We really should have resolved this some time ago. ","{""url"": ""https://api.github.com/repos/pydata/xarray/issues/114/reactions"", ""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,completed,13221727,issue