home / github

Menu
  • Search all tables
  • GraphQL API

issue_comments

Table actions
  • GraphQL API for issue_comments

6 rows where issue = 842940980 and user = 5635139 sorted by updated_at descending

✎ View and edit SQL

This data as json, CSV (advanced)

Suggested facets: reactions, created_at (date), updated_at (date)

user 1

  • max-sixty · 6 ✖

issue 1

  • Add drop duplicates · 6 ✖

author_association 1

  • MEMBER 6
id html_url issue_url node_id user created_at updated_at ▲ author_association body reactions performed_via_github_app issue
830237579 https://github.com/pydata/xarray/pull/5089#issuecomment-830237579 https://api.github.com/repos/pydata/xarray/issues/5089 MDEyOklzc3VlQ29tbWVudDgzMDIzNzU3OQ== max-sixty 5635139 2021-04-30T17:12:02Z 2021-04-30T17:12:02Z MEMBER

This is great work and it would be good to get this in for the upcoming release https://github.com/pydata/xarray/issues/5232.

I think there are two paths: 1. Narrow: merge the functionality which works along 1D dimensioned coords 2. Full: Ensure we're at consensus on how we handle >1D coords

I would mildly vote for narrow. While I would also vote to merge it as-is, I think it's not a huge task to move wide onto a new branch.

@ahuang11 what are your thoughts?

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Add drop duplicates 842940980
822098673 https://github.com/pydata/xarray/pull/5089#issuecomment-822098673 https://api.github.com/repos/pydata/xarray/issues/5089 MDEyOklzc3VlQ29tbWVudDgyMjA5ODY3Mw== max-sixty 5635139 2021-04-19T00:41:47Z 2021-04-19T00:41:47Z MEMBER

@max-sixty is there a case where you don't think we could do a single isel? I'd love to do the single isel() call if possible, because that should have the best performance by far.

IIUC there are two broad cases here - where every supplied coord is a dimensioned coord — it's v simple, just isel non-duplicates for each dimension* - where there's a non-dimensioned coord with ndim > 1, then it requires stacking; e.g. the example above. Is there a different way of doing this?

```python In [12]: da Out[12]: <xarray.DataArray (init: 2, tau: 3)> array([[1, 2, 3], [4, 5, 6]]) Coordinates: * init (init) int64 0 1 * tau (tau) int64 1 2 3 valid (init, tau) int64 8 6 6 7 7 7

In [13]: da.drop_duplicate_coords("valid") Out[13]: <xarray.DataArray (valid: 3)> array([1, 2, 4]) Coordinates: * valid (valid) int64 8 6 7 init (valid) int64 0 0 1 tau (valid) int64 1 2 1 ```

* very close to this is a 1D non-dimensioned coord, in which case we can either turn it into a dimensioned coord or retain the existing dimensioned coords — I think probably the former if we allow the stacking case, for the sake of consistency.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Add drop duplicates 842940980
822089198 https://github.com/pydata/xarray/pull/5089#issuecomment-822089198 https://api.github.com/repos/pydata/xarray/issues/5089 MDEyOklzc3VlQ29tbWVudDgyMjA4OTE5OA== max-sixty 5635139 2021-04-18T23:57:20Z 2021-04-18T23:57:20Z MEMBER

@ahuang11 IIUC, this is only using .stack where it needs to actually stack the array, is that correct? So a list of dims is passed (rather than non-dim coords), then it's not stacking.

I agree with @shoyer that we could do it in a single isel in the basic case. One option is to have a fast path for non-dim coords only, and call isel once with those.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Add drop duplicates 842940980
821902582 https://github.com/pydata/xarray/pull/5089#issuecomment-821902582 https://api.github.com/repos/pydata/xarray/issues/5089 MDEyOklzc3VlQ29tbWVudDgyMTkwMjU4Mg== max-sixty 5635139 2021-04-17T23:37:07Z 2021-04-17T23:37:07Z MEMBER

Hi @ahuang11 — forgive the delay. We discussed this with the team on our call and think it would be a welcome addition, so thank you for contributing.

I took another look through the tests and the behavior looks ideal for dimensioned coords are passed:

```python In [6]: da Out[6]: <xarray.DataArray (lat: 5, lon: 5)> array([[ 0, 0, 0, 0, 0], [ 0, 1, 2, 3, 4], [ 0, 2, 4, 6, 8], [ 0, 3, 6, 9, 12], [ 0, 4, 8, 12, 16]]) Coordinates: * lat (lat) int64 0 1 2 2 3 * lon (lon) int64 0 1 3 3 4

In [7]: result = da.drop_duplicate_coords(["lat", "lon"], keep='first')

In [8]: result Out[8]: <xarray.DataArray (lat: 4, lon: 4)> array([[ 0, 0, 0, 0], [ 0, 1, 2, 4], [ 0, 2, 4, 8], [ 0, 4, 8, 16]]) Coordinates: * lat (lat) int64 0 1 2 3 * lon (lon) int64 0 1 3 4 ```

And I think this is also the best we can do for non-dimensioned coords. One thing I call out is that: a. The array is stacked for any non-dim coord > 1 dim b. The supplied coord becomes the new dimensioned coord

e.g. Stacking:

```python

In [12]: da Out[12]: <xarray.DataArray (init: 2, tau: 3)> array([[1, 2, 3], [4, 5, 6]]) Coordinates: * init (init) int64 0 1 * tau (tau) int64 1 2 3 valid (init, tau) int64 8 6 6 7 7 7

In [13]: da.drop_duplicate_coords("valid") Out[13]: <xarray.DataArray (valid: 3)> array([1, 2, 4]) Coordinates: * valid (valid) int64 8 6 7 init (valid) int64 0 0 1 tau (valid) int64 1 2 1 ```

Changing the dimensions: zeta becoming the new dimension, from tau:

```python

In [16]: ( ...: da ...: .assign_coords(dict(zeta=(('tau'),[4,4,6]))) ...: .drop_duplicate_coords('zeta') ...: ) Out[16]: <xarray.DataArray (init: 2, zeta: 2)> array([[1, 3], [4, 6]]) Coordinates: * init (init) int64 0 1 valid (init, zeta) int64 8 6 7 7 * zeta (zeta) int64 4 6 tau (zeta) int64 1 3 ```

One peculiarity — though I think a necessary one — is that the order matters in some cases:

```python

In [17]: ( ...: da ...: .assign_coords(dict(zeta=(('tau'),[4,4,6]))) ...: .drop_duplicate_coords(['zeta','valid']) ...: ) Out[17]: <xarray.DataArray (valid: 3)> array([1, 3, 4]) Coordinates: * valid (valid) int64 8 6 7 tau (valid) int64 1 3 1 init (valid) int64 0 0 1 zeta (valid) int64 4 6 4

In [18]: ( ...: da ...: .assign_coords(dict(zeta=(('tau'),[4,4,6]))) ...: .drop_duplicate_coords(['valid','zeta']) ...: ) Out[18]: <xarray.DataArray (zeta: 1)> array([1]) Coordinates: * zeta (zeta) int64 4 init (zeta) int64 0 tau (zeta) int64 1 valid (zeta) int64 8 ```

Unless anyone has any more thoughts, let's plan to merge this over the next few days. Thanks again @ahuang11 !

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Add drop duplicates 842940980
813109553 https://github.com/pydata/xarray/pull/5089#issuecomment-813109553 https://api.github.com/repos/pydata/xarray/issues/5089 MDEyOklzc3VlQ29tbWVudDgxMzEwOTU1Mw== max-sixty 5635139 2021-04-04T22:35:15Z 2021-04-04T22:35:15Z MEMBER

If we don't hear anything, let's add this to the top of the list for the next dev call in ten days

{
    "total_count": 1,
    "+1": 1,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Add drop duplicates 842940980
811203549 https://github.com/pydata/xarray/pull/5089#issuecomment-811203549 https://api.github.com/repos/pydata/xarray/issues/5089 MDEyOklzc3VlQ29tbWVudDgxMTIwMzU0OQ== max-sixty 5635139 2021-03-31T16:23:22Z 2021-03-31T16:23:22Z MEMBER

@pydata/xarray we didn't get to this on the call today — two questions from @mathause : - should we have dims=None default to all dims? Or are we gradually transitioning to dims=... for all dims? - Is drop_duplicates a good name? Or should it explicitly refer to dropping duplicates on the index?

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Add drop duplicates 842940980

Advanced export

JSON shape: default, array, newline-delimited, object

CSV options:

CREATE TABLE [issue_comments] (
   [html_url] TEXT,
   [issue_url] TEXT,
   [id] INTEGER PRIMARY KEY,
   [node_id] TEXT,
   [user] INTEGER REFERENCES [users]([id]),
   [created_at] TEXT,
   [updated_at] TEXT,
   [author_association] TEXT,
   [body] TEXT,
   [reactions] TEXT,
   [performed_via_github_app] TEXT,
   [issue] INTEGER REFERENCES [issues]([id])
);
CREATE INDEX [idx_issue_comments_issue]
    ON [issue_comments] ([issue]);
CREATE INDEX [idx_issue_comments_user]
    ON [issue_comments] ([user]);
Powered by Datasette · Queries took 726.071ms · About: xarray-datasette