html_url,issue_url,id,node_id,user,created_at,updated_at,author_association,body,reactions,performed_via_github_app,issue https://github.com/pydata/xarray/issues/967#issuecomment-298117210,https://api.github.com/repos/pydata/xarray/issues/967,298117210,MDEyOklzc3VlQ29tbWVudDI5ODExNzIxMA==,1217238,2017-04-28T22:04:15Z,2017-04-28T22:04:15Z,MEMBER,"> Why would I have a xarray.core.dataset.DataVariables object as input? Indeed, you would not. I think my earlier comment was a little confusing here. I meant you could have name(s) of variables in a Dataset (which means either coords or data_vars) or coords on a DataArray. > In my mind it should only be 1.) name(s) of existing index coords, or 2.) 1D DataArray(s) with dim in self.dims Yes, agreed.","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,171077425 https://github.com/pydata/xarray/issues/967#issuecomment-298109506,https://api.github.com/repos/pydata/xarray/issues/967,298109506,MDEyOklzc3VlQ29tbWVudDI5ODEwOTUwNg==,5572303,2017-04-28T21:20:57Z,2017-04-28T21:20:57Z,CONTRIBUTOR,"Sounds good. As I'm writing the type-checking code I run into this question: Why would I have a `xarray.core.dataset.DataVariables` object as input? A DataVariables object could contain multiple DataArrays, which makes the interpretation a bit unclear. In my mind it should only be 1.) name(s) of existing index coords, or 2.) 1D DataArray(s) with dim in `self.dims`","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,171077425 https://github.com/pydata/xarray/issues/967#issuecomment-297875833,https://api.github.com/repos/pydata/xarray/issues/967,297875833,MDEyOklzc3VlQ29tbWVudDI5Nzg3NTgzMw==,1217238,2017-04-28T00:33:55Z,2017-04-28T00:40:43Z,MEMBER,"> What would the signature of sortby() be then? Maybe something like: `sortby(variables, ascending=True)`, where `variables` can be any of: - name of a 1D variable in `coords` (on a DataArray) or `coords`/`data_vars` (on a Dataset): these get converted in a DataArray like `self[name]`. - a 1D DataArray, with a dimension found in `self.dims` - list of either of the above, either along the same or different dimensions (this could be added later) So I think this covers all the use cases of `sort_index()`, but is slightly more general. If you really want to sort a 1D DataArray by its own values, you would write `da.sortby(da)`, but I agree that that will be rare.","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,171077425 https://github.com/pydata/xarray/issues/967#issuecomment-297875052,https://api.github.com/repos/pydata/xarray/issues/967,297875052,MDEyOklzc3VlQ29tbWVudDI5Nzg3NTA1Mg==,5572303,2017-04-28T00:27:46Z,2017-04-28T00:27:46Z,CONTRIBUTOR,"What would the signature of `sortby()` be then? On our end we just want a more intuitive way to sort by dimension labels, so now I have `sort_index(self, dims, ascending=True)`. `sortby()`, based on your description, seems like a separate method. Or any suggestion on how we can marry the two into something coherent?","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,171077425 https://github.com/pydata/xarray/issues/967#issuecomment-297871870,https://api.github.com/repos/pydata/xarray/issues/967,297871870,MDEyOklzc3VlQ29tbWVudDI5Nzg3MTg3MA==,1217238,2017-04-28T00:03:57Z,2017-04-28T00:03:57Z,MEMBER,"@chunweiyuan I would skip `inplace` -- it's just not worth the complexity. It certainly does not make things any faster, so there is little gain from it. If you really want it, search for `inplace` in `dataset.py` and `dataarray.py` for examples. When you assign to `self`, it creates a local variable: it doesn't override the object instance (Python doesn't support that). I actually like the name `sortby()`, allowing any 1D variables as the argument (not just coordinates), as long as they have distinct dimensions. This works better for xarray than it does for pandas because we always have axis/dimension names.","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,171077425 https://github.com/pydata/xarray/issues/967#issuecomment-297868909,https://api.github.com/repos/pydata/xarray/issues/967,297868909,MDEyOklzc3VlQ29tbWVudDI5Nzg2ODkwOQ==,5572303,2017-04-27T23:43:12Z,2017-04-27T23:43:12Z,CONTRIBUTOR,"A couple of things: 1.) Upon a little thinking I believe `sort_values()` doesn't make much sense, so I'm only working on `sort_index()'. 2.) the way I handle the `inplace` kwarg is by ``` if inplace: self = self.isel(**{d: self.indexes[d].argsort() if ascending else self.indexes[d].argsort()[::-1] for d in dimensions}) else: return self.isel(**{d: self.indexes[d].argsort() if ascending else self.indexes[d].argsort()[::-1] for d in dimensions}) ``` But when I run ``` ds.sort_index(dims=['x', 'y'], inplace=True) ``` nothing changes. If I put a `pdb.set_trace()` right below the self = self*** I can evaluate self and see that it's what I want it to be. But somehow that assignment is not realized to the higher level. Any quick pointer?","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,171077425 https://github.com/pydata/xarray/issues/967#issuecomment-296409111,https://api.github.com/repos/pydata/xarray/issues/967,296409111,MDEyOklzc3VlQ29tbWVudDI5NjQwOTExMQ==,1217238,2017-04-22T23:50:05Z,2017-04-22T23:50:05Z,MEMBER,"If you pass in a list or array to `isel`, xarray copies the underling data like numpy does: http://xarray.pydata.org/en/stable/indexing.html#copies-vs-views These approach do go do slightly different code paths, but they would have equivalent performance in most cases because the indexing cost will dominate over sorting. I would prefer using `np.argsort()` only because it's slightly more general, insofar as it doesn't rely on the index having unique labels. (If you have duplicate labels, `reindex` fails.) It also avoids needing to build the hash table for index based lookups, which has a small amount of overhead. Also, as a side note `np.sort` and `np.argsort` are slightly faster than Python's `sorted()`, because can rely on homogeneous data dtypes.","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,171077425 https://github.com/pydata/xarray/issues/967#issuecomment-296394094,https://api.github.com/repos/pydata/xarray/issues/967,296394094,MDEyOklzc3VlQ29tbWVudDI5NjM5NDA5NA==,5572303,2017-04-22T18:57:07Z,2017-04-22T18:57:07Z,CONTRIBUTOR,"On our end, we currently do the following when we need to sort by axis label (lat/lon in this case): ``` da.reindex(indexers={'lat':sorted(da.coords['lat'].values), 'lon':sorted(da.coords['lon'].values)}) ``` Upon first glance of the source code I think our approach goes down different code path than your `.isel()` approach. The most obvious difference, from a user's stand point, is probably that `.reindex()` returns a new object, whereas `.isel()` returns a view (typically). In Pandas, both `sort_index()` and `sort_values()` seem to return new objects. We'd be happy to contribute to an xarray version of `sort_index()` and `sort_values()`. The first question is, which one would be the more robust and computationally efficient code path to take?","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,171077425