home / github

Menu
  • GraphQL API
  • Search all tables

issue_comments

Table actions
  • GraphQL API for issue_comments

3 rows where author_association = "NONE" and issue = 771382653 sorted by updated_at descending

✎ View and edit SQL

This data as json, CSV (advanced)

Suggested facets: created_at (date), updated_at (date)

user 1

  • batterseapower 3

issue 1

  • Allow sel's method and tolerance to vary per-dimension · 3 ✖

author_association 1

  • NONE · 3 ✖
id html_url issue_url node_id user created_at updated_at ▲ author_association body reactions performed_via_github_app issue
748486801 https://github.com/pydata/xarray/issues/4714#issuecomment-748486801 https://api.github.com/repos/pydata/xarray/issues/4714 MDEyOklzc3VlQ29tbWVudDc0ODQ4NjgwMQ== batterseapower 18488 2020-12-19T15:13:36Z 2020-12-19T15:14:59Z NONE

Thanks for the response. I think reindex would need to be changed as well because this code:

python sensor_data.reindex({ 'time': [1], 'sensor': ['A', 'B'] }, method='ffill')

Is not equivalent to this code: python sensor_data.reindex({ 'time': [1], 'sensor': ['A', 'B'] }).ffill(dim='time').ffill(dim='sensor')

So if I understand your to_dataset idea correctly, you are proposing:

python ds = sensor_data.to_dataset(dim='sensor') xr.concat([ ds[sensor].sel({'time': time}, method='ffill', drop=True) for sensor, time in zip(['A', 'A', 'A', 'B', 'C'], [0, 1, 2, 0, 0]) ], dim='sample')

I guess this works but it's a bit cumbersome and unlikely to be fast. I think there must be something I'm not understanding here - I'm not familiar with all the nuances of the xarray api.

Your idea of reindex followed by sel is an interesting one, but it does do something slightly different than what I was asking for: it does not fail if one of the sensors in the query list is missing, but rather inserts a NaN. I suppose you could fix this by doing an extra check afterwards, assuming that your original pre-reindex data contained no NaNs.

In general min(S*N,T*N) could be much larger than S*T, so for big queries it's quite possible that you wouldn't have enough space to allocate the intermediate even if you could fit 100s of copies of the original S*T matrix. Using a dask cluster would make this situation less likely of course, but it seems like it would be better to avoid all this copying (even on a beefy cluster) even if just for performance reasons.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Allow sel's method and tolerance to vary per-dimension 771382653
748479287 https://github.com/pydata/xarray/issues/4714#issuecomment-748479287 https://api.github.com/repos/pydata/xarray/issues/4714 MDEyOklzc3VlQ29tbWVudDc0ODQ3OTI4Nw== batterseapower 18488 2020-12-19T14:06:36Z 2020-12-19T14:06:36Z NONE

Thanks for the suggestion. One issue with this alternative is it creates a potentially large intermediate object.

If you have T times and S sensors, and want to sample them at N (time, sensor) pairs, then the intermediate object with your approach has size T*N (if you index sensors first) or S*N (if you index time first). If you can index both dimensions in one sel call then we should only need to allocate memory for the result of size N, which is considerably better.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Allow sel's method and tolerance to vary per-dimension 771382653
748477889 https://github.com/pydata/xarray/issues/4714#issuecomment-748477889 https://api.github.com/repos/pydata/xarray/issues/4714 MDEyOklzc3VlQ29tbWVudDc0ODQ3Nzg4OQ== batterseapower 18488 2020-12-19T13:53:53Z 2020-12-19T13:53:53Z NONE

I guess it would also make sense to have this in reindex if you did decide to add it.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Allow sel's method and tolerance to vary per-dimension 771382653

Advanced export

JSON shape: default, array, newline-delimited, object

CSV options:

CREATE TABLE [issue_comments] (
   [html_url] TEXT,
   [issue_url] TEXT,
   [id] INTEGER PRIMARY KEY,
   [node_id] TEXT,
   [user] INTEGER REFERENCES [users]([id]),
   [created_at] TEXT,
   [updated_at] TEXT,
   [author_association] TEXT,
   [body] TEXT,
   [reactions] TEXT,
   [performed_via_github_app] TEXT,
   [issue] INTEGER REFERENCES [issues]([id])
);
CREATE INDEX [idx_issue_comments_issue]
    ON [issue_comments] ([issue]);
CREATE INDEX [idx_issue_comments_user]
    ON [issue_comments] ([user]);
Powered by Datasette · Queries took 320.323ms · About: xarray-datasette