home / github

Menu
  • GraphQL API
  • Search all tables

issue_comments

Table actions
  • GraphQL API for issue_comments

8 rows where issue = 171077425 sorted by updated_at descending

✎ View and edit SQL

This data as json, CSV (advanced)

Suggested facets: created_at (date), updated_at (date)

user 2

  • shoyer 4
  • chunweiyuan 4

author_association 2

  • CONTRIBUTOR 4
  • MEMBER 4

issue 1

  • sortby() or sort_index() method for Dataset and DataArray · 8 ✖
id html_url issue_url node_id user created_at updated_at ▲ author_association body reactions performed_via_github_app issue
298117210 https://github.com/pydata/xarray/issues/967#issuecomment-298117210 https://api.github.com/repos/pydata/xarray/issues/967 MDEyOklzc3VlQ29tbWVudDI5ODExNzIxMA== shoyer 1217238 2017-04-28T22:04:15Z 2017-04-28T22:04:15Z MEMBER

Why would I have a xarray.core.dataset.DataVariables object as input?

Indeed, you would not. I think my earlier comment was a little confusing here. I meant you could have name(s) of variables in a Dataset (which means either coords or data_vars) or coords on a DataArray.

In my mind it should only be 1.) name(s) of existing index coords, or 2.) 1D DataArray(s) with dim in self.dims

Yes, agreed.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  sortby() or sort_index() method for Dataset and DataArray 171077425
298109506 https://github.com/pydata/xarray/issues/967#issuecomment-298109506 https://api.github.com/repos/pydata/xarray/issues/967 MDEyOklzc3VlQ29tbWVudDI5ODEwOTUwNg== chunweiyuan 5572303 2017-04-28T21:20:57Z 2017-04-28T21:20:57Z CONTRIBUTOR

Sounds good. As I'm writing the type-checking code I run into this question: Why would I have a xarray.core.dataset.DataVariables object as input? A DataVariables object could contain multiple DataArrays, which makes the interpretation a bit unclear. In my mind it should only be 1.) name(s) of existing index coords, or 2.) 1D DataArray(s) with dim in self.dims

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  sortby() or sort_index() method for Dataset and DataArray 171077425
297875833 https://github.com/pydata/xarray/issues/967#issuecomment-297875833 https://api.github.com/repos/pydata/xarray/issues/967 MDEyOklzc3VlQ29tbWVudDI5Nzg3NTgzMw== shoyer 1217238 2017-04-28T00:33:55Z 2017-04-28T00:40:43Z MEMBER

What would the signature of sortby() be then?

Maybe something like: sortby(variables, ascending=True), where variables can be any of: - name of a 1D variable in coords (on a DataArray) or coords/data_vars (on a Dataset): these get converted in a DataArray like self[name]. - a 1D DataArray, with a dimension found in self.dims - list of either of the above, either along the same or different dimensions (this could be added later)

So I think this covers all the use cases of sort_index(), but is slightly more general.

If you really want to sort a 1D DataArray by its own values, you would write da.sortby(da), but I agree that that will be rare.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  sortby() or sort_index() method for Dataset and DataArray 171077425
297875052 https://github.com/pydata/xarray/issues/967#issuecomment-297875052 https://api.github.com/repos/pydata/xarray/issues/967 MDEyOklzc3VlQ29tbWVudDI5Nzg3NTA1Mg== chunweiyuan 5572303 2017-04-28T00:27:46Z 2017-04-28T00:27:46Z CONTRIBUTOR

What would the signature of sortby() be then? On our end we just want a more intuitive way to sort by dimension labels, so now I have sort_index(self, dims, ascending=True). sortby(), based on your description, seems like a separate method. Or any suggestion on how we can marry the two into something coherent?

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  sortby() or sort_index() method for Dataset and DataArray 171077425
297871870 https://github.com/pydata/xarray/issues/967#issuecomment-297871870 https://api.github.com/repos/pydata/xarray/issues/967 MDEyOklzc3VlQ29tbWVudDI5Nzg3MTg3MA== shoyer 1217238 2017-04-28T00:03:57Z 2017-04-28T00:03:57Z MEMBER

@chunweiyuan I would skip inplace -- it's just not worth the complexity. It certainly does not make things any faster, so there is little gain from it. If you really want it, search for inplace in dataset.py and dataarray.py for examples.

When you assign to self, it creates a local variable: it doesn't override the object instance (Python doesn't support that).

I actually like the name sortby(), allowing any 1D variables as the argument (not just coordinates), as long as they have distinct dimensions. This works better for xarray than it does for pandas because we always have axis/dimension names.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  sortby() or sort_index() method for Dataset and DataArray 171077425
297868909 https://github.com/pydata/xarray/issues/967#issuecomment-297868909 https://api.github.com/repos/pydata/xarray/issues/967 MDEyOklzc3VlQ29tbWVudDI5Nzg2ODkwOQ== chunweiyuan 5572303 2017-04-27T23:43:12Z 2017-04-27T23:43:12Z CONTRIBUTOR

A couple of things:

1.) Upon a little thinking I believe sort_values() doesn't make much sense, so I'm only working on sort_index()'. 2.) the way I handle theinplacekwarg is by ``` if inplace: self = self.isel(**{d: self.indexes[d].argsort() if ascending else self.indexes[d].argsort()[::-1] for d in dimensions}) else: return self.isel(**{d: self.indexes[d].argsort() if ascending else self.indexes[d].argsort()[::-1] for d in dimensions}) ``` But when I run ``` ds.sort_index(dims=['x', 'y'], inplace=True) ``` nothing changes. If I put apdb.set_trace()` right below the self = self*** I can evaluate self and see that it's what I want it to be. But somehow that assignment is not realized to the higher level. Any quick pointer?

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  sortby() or sort_index() method for Dataset and DataArray 171077425
296409111 https://github.com/pydata/xarray/issues/967#issuecomment-296409111 https://api.github.com/repos/pydata/xarray/issues/967 MDEyOklzc3VlQ29tbWVudDI5NjQwOTExMQ== shoyer 1217238 2017-04-22T23:50:05Z 2017-04-22T23:50:05Z MEMBER

If you pass in a list or array to isel, xarray copies the underling data like numpy does: http://xarray.pydata.org/en/stable/indexing.html#copies-vs-views

These approach do go do slightly different code paths, but they would have equivalent performance in most cases because the indexing cost will dominate over sorting. I would prefer using np.argsort() only because it's slightly more general, insofar as it doesn't rely on the index having unique labels. (If you have duplicate labels, reindex fails.) It also avoids needing to build the hash table for index based lookups, which has a small amount of overhead.

Also, as a side note np.sort and np.argsort are slightly faster than Python's sorted(), because can rely on homogeneous data dtypes.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  sortby() or sort_index() method for Dataset and DataArray 171077425
296394094 https://github.com/pydata/xarray/issues/967#issuecomment-296394094 https://api.github.com/repos/pydata/xarray/issues/967 MDEyOklzc3VlQ29tbWVudDI5NjM5NDA5NA== chunweiyuan 5572303 2017-04-22T18:57:07Z 2017-04-22T18:57:07Z CONTRIBUTOR

On our end, we currently do the following when we need to sort by axis label (lat/lon in this case): da.reindex(indexers={'lat':sorted(da.coords['lat'].values), 'lon':sorted(da.coords['lon'].values)}) Upon first glance of the source code I think our approach goes down different code path than your .isel() approach. The most obvious difference, from a user's stand point, is probably that .reindex() returns a new object, whereas .isel() returns a view (typically). In Pandas, both sort_index() and sort_values() seem to return new objects.

We'd be happy to contribute to an xarray version of sort_index() and sort_values(). The first question is, which one would be the more robust and computationally efficient code path to take?

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  sortby() or sort_index() method for Dataset and DataArray 171077425

Advanced export

JSON shape: default, array, newline-delimited, object

CSV options:

CREATE TABLE [issue_comments] (
   [html_url] TEXT,
   [issue_url] TEXT,
   [id] INTEGER PRIMARY KEY,
   [node_id] TEXT,
   [user] INTEGER REFERENCES [users]([id]),
   [created_at] TEXT,
   [updated_at] TEXT,
   [author_association] TEXT,
   [body] TEXT,
   [reactions] TEXT,
   [performed_via_github_app] TEXT,
   [issue] INTEGER REFERENCES [issues]([id])
);
CREATE INDEX [idx_issue_comments_issue]
    ON [issue_comments] ([issue]);
CREATE INDEX [idx_issue_comments_user]
    ON [issue_comments] ([user]);
Powered by Datasette · Queries took 12.317ms · About: xarray-datasette