home / github

Menu
  • GraphQL API
  • Search all tables

issue_comments

Table actions
  • GraphQL API for issue_comments

4 rows where issue = 171077425 and user = 1217238 sorted by updated_at descending

✎ View and edit SQL

This data as json, CSV (advanced)

Suggested facets: created_at (date), updated_at (date)

user 1

  • shoyer · 4 ✖

issue 1

  • sortby() or sort_index() method for Dataset and DataArray · 4 ✖

author_association 1

  • MEMBER 4
id html_url issue_url node_id user created_at updated_at ▲ author_association body reactions performed_via_github_app issue
298117210 https://github.com/pydata/xarray/issues/967#issuecomment-298117210 https://api.github.com/repos/pydata/xarray/issues/967 MDEyOklzc3VlQ29tbWVudDI5ODExNzIxMA== shoyer 1217238 2017-04-28T22:04:15Z 2017-04-28T22:04:15Z MEMBER

Why would I have a xarray.core.dataset.DataVariables object as input?

Indeed, you would not. I think my earlier comment was a little confusing here. I meant you could have name(s) of variables in a Dataset (which means either coords or data_vars) or coords on a DataArray.

In my mind it should only be 1.) name(s) of existing index coords, or 2.) 1D DataArray(s) with dim in self.dims

Yes, agreed.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  sortby() or sort_index() method for Dataset and DataArray 171077425
297875833 https://github.com/pydata/xarray/issues/967#issuecomment-297875833 https://api.github.com/repos/pydata/xarray/issues/967 MDEyOklzc3VlQ29tbWVudDI5Nzg3NTgzMw== shoyer 1217238 2017-04-28T00:33:55Z 2017-04-28T00:40:43Z MEMBER

What would the signature of sortby() be then?

Maybe something like: sortby(variables, ascending=True), where variables can be any of: - name of a 1D variable in coords (on a DataArray) or coords/data_vars (on a Dataset): these get converted in a DataArray like self[name]. - a 1D DataArray, with a dimension found in self.dims - list of either of the above, either along the same or different dimensions (this could be added later)

So I think this covers all the use cases of sort_index(), but is slightly more general.

If you really want to sort a 1D DataArray by its own values, you would write da.sortby(da), but I agree that that will be rare.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  sortby() or sort_index() method for Dataset and DataArray 171077425
297871870 https://github.com/pydata/xarray/issues/967#issuecomment-297871870 https://api.github.com/repos/pydata/xarray/issues/967 MDEyOklzc3VlQ29tbWVudDI5Nzg3MTg3MA== shoyer 1217238 2017-04-28T00:03:57Z 2017-04-28T00:03:57Z MEMBER

@chunweiyuan I would skip inplace -- it's just not worth the complexity. It certainly does not make things any faster, so there is little gain from it. If you really want it, search for inplace in dataset.py and dataarray.py for examples.

When you assign to self, it creates a local variable: it doesn't override the object instance (Python doesn't support that).

I actually like the name sortby(), allowing any 1D variables as the argument (not just coordinates), as long as they have distinct dimensions. This works better for xarray than it does for pandas because we always have axis/dimension names.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  sortby() or sort_index() method for Dataset and DataArray 171077425
296409111 https://github.com/pydata/xarray/issues/967#issuecomment-296409111 https://api.github.com/repos/pydata/xarray/issues/967 MDEyOklzc3VlQ29tbWVudDI5NjQwOTExMQ== shoyer 1217238 2017-04-22T23:50:05Z 2017-04-22T23:50:05Z MEMBER

If you pass in a list or array to isel, xarray copies the underling data like numpy does: http://xarray.pydata.org/en/stable/indexing.html#copies-vs-views

These approach do go do slightly different code paths, but they would have equivalent performance in most cases because the indexing cost will dominate over sorting. I would prefer using np.argsort() only because it's slightly more general, insofar as it doesn't rely on the index having unique labels. (If you have duplicate labels, reindex fails.) It also avoids needing to build the hash table for index based lookups, which has a small amount of overhead.

Also, as a side note np.sort and np.argsort are slightly faster than Python's sorted(), because can rely on homogeneous data dtypes.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  sortby() or sort_index() method for Dataset and DataArray 171077425

Advanced export

JSON shape: default, array, newline-delimited, object

CSV options:

CREATE TABLE [issue_comments] (
   [html_url] TEXT,
   [issue_url] TEXT,
   [id] INTEGER PRIMARY KEY,
   [node_id] TEXT,
   [user] INTEGER REFERENCES [users]([id]),
   [created_at] TEXT,
   [updated_at] TEXT,
   [author_association] TEXT,
   [body] TEXT,
   [reactions] TEXT,
   [performed_via_github_app] TEXT,
   [issue] INTEGER REFERENCES [issues]([id])
);
CREATE INDEX [idx_issue_comments_issue]
    ON [issue_comments] ([issue]);
CREATE INDEX [idx_issue_comments_user]
    ON [issue_comments] ([user]);
Powered by Datasette · Queries took 216.012ms · About: xarray-datasette