html_url,issue_url,id,node_id,user,created_at,updated_at,author_association,body,reactions,performed_via_github_app,issue
https://github.com/pydata/xarray/pull/5089#issuecomment-830237579,https://api.github.com/repos/pydata/xarray/issues/5089,830237579,MDEyOklzc3VlQ29tbWVudDgzMDIzNzU3OQ==,5635139,2021-04-30T17:12:02Z,2021-04-30T17:12:02Z,MEMBER,"This is great work and it would be good to get this in for the upcoming release https://github.com/pydata/xarray/issues/5232.
I think there are two paths:
1. Narrow: merge the functionality which works along 1D dimensioned coords
2. Full: Ensure we're at consensus on how we handle >1D coords
I would mildly vote for narrow. While I would also vote to merge it as-is, I think it's not a huge task to move wide onto a new branch.
@ahuang11 what are your thoughts?","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,842940980
https://github.com/pydata/xarray/pull/5089#issuecomment-822098673,https://api.github.com/repos/pydata/xarray/issues/5089,822098673,MDEyOklzc3VlQ29tbWVudDgyMjA5ODY3Mw==,5635139,2021-04-19T00:41:47Z,2021-04-19T00:41:47Z,MEMBER,"> @max-sixty is there a case where you don't think we could do a single `isel`? I'd love to do the single `isel()` call if possible, because that should have the best performance by far.
IIUC there are two broad cases here
- where every supplied coord is a dimensioned coord — it's v simple, just isel non-duplicates for each dimension*
- where there's a non-dimensioned coord with ndim > 1, then it requires stacking; e.g. the example above. Is there a different way of doing this?
```python
In [12]: da
Out[12]:
array([[1, 2, 3],
[4, 5, 6]])
Coordinates:
* init (init) int64 0 1
* tau (tau) int64 1 2 3
valid (init, tau) int64 8 6 6 7 7 7
In [13]: da.drop_duplicate_coords(""valid"")
Out[13]:
array([1, 2, 4])
Coordinates:
* valid (valid) int64 8 6 7
init (valid) int64 0 0 1
tau (valid) int64 1 2 1
```
\* very close to this is a 1D non-dimensioned coord, in which case we can either turn it into a dimensioned coord or retain the existing dimensioned coords — I think probably the former if we allow the stacking case, for the sake of consistency.","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,842940980
https://github.com/pydata/xarray/pull/5089#issuecomment-822089198,https://api.github.com/repos/pydata/xarray/issues/5089,822089198,MDEyOklzc3VlQ29tbWVudDgyMjA4OTE5OA==,5635139,2021-04-18T23:57:20Z,2021-04-18T23:57:20Z,MEMBER,"@ahuang11 IIUC, this is only using `.stack` where it needs to actually stack the array, is that correct? So a list of dims is passed (rather than non-dim coords), then it's not stacking.
I agree with @shoyer that we could do it in a single `isel` in the basic case. One option is to have a fast path for non-dim coords only, and call isel once with those.","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,842940980
https://github.com/pydata/xarray/pull/5089#issuecomment-821902582,https://api.github.com/repos/pydata/xarray/issues/5089,821902582,MDEyOklzc3VlQ29tbWVudDgyMTkwMjU4Mg==,5635139,2021-04-17T23:37:07Z,2021-04-17T23:37:07Z,MEMBER,"Hi @ahuang11 — forgive the delay. We discussed this with the team on our call and think it would be a welcome addition, so thank you for contributing.
I took another look through the tests and the behavior looks ideal for dimensioned coords are passed:
```python
In [6]: da
Out[6]:
array([[ 0, 0, 0, 0, 0],
[ 0, 1, 2, 3, 4],
[ 0, 2, 4, 6, 8],
[ 0, 3, 6, 9, 12],
[ 0, 4, 8, 12, 16]])
Coordinates:
* lat (lat) int64 0 1 2 2 3
* lon (lon) int64 0 1 3 3 4
In [7]: result = da.drop_duplicate_coords([""lat"", ""lon""], keep='first')
In [8]: result
Out[8]:
array([[ 0, 0, 0, 0],
[ 0, 1, 2, 4],
[ 0, 2, 4, 8],
[ 0, 4, 8, 16]])
Coordinates:
* lat (lat) int64 0 1 2 3
* lon (lon) int64 0 1 3 4
```
And I _think_ this is also the best we can do for non-dimensioned coords. One thing I call out is that:
a. The array is stacked for any non-dim coord > 1 dim
b. The supplied coord becomes the new dimensioned coord
e.g. Stacking:
```python
In [12]: da
Out[12]:
array([[1, 2, 3],
[4, 5, 6]])
Coordinates:
* init (init) int64 0 1
* tau (tau) int64 1 2 3
valid (init, tau) int64 8 6 6 7 7 7
In [13]: da.drop_duplicate_coords(""valid"")
Out[13]:
array([1, 2, 4])
Coordinates:
* valid (valid) int64 8 6 7
init (valid) int64 0 0 1
tau (valid) int64 1 2 1
```
Changing the dimensions: `zeta` becoming the new dimension, from `tau`:
```python
In [16]: (
...: da
...: .assign_coords(dict(zeta=(('tau'),[4,4,6])))
...: .drop_duplicate_coords('zeta')
...: )
Out[16]:
array([[1, 3],
[4, 6]])
Coordinates:
* init (init) int64 0 1
valid (init, zeta) int64 8 6 7 7
* zeta (zeta) int64 4 6
tau (zeta) int64 1 3
```
One peculiarity — though I think a necessary one — is that the order matters in some cases:
```python
In [17]: (
...: da
...: .assign_coords(dict(zeta=(('tau'),[4,4,6])))
...: .drop_duplicate_coords(['zeta','valid'])
...: )
Out[17]:
array([1, 3, 4])
Coordinates:
* valid (valid) int64 8 6 7
tau (valid) int64 1 3 1
init (valid) int64 0 0 1
zeta (valid) int64 4 6 4
In [18]: (
...: da
...: .assign_coords(dict(zeta=(('tau'),[4,4,6])))
...: .drop_duplicate_coords(['valid','zeta'])
...: )
Out[18]:
array([1])
Coordinates:
* zeta (zeta) int64 4
init (zeta) int64 0
tau (zeta) int64 1
valid (zeta) int64 8
```
Unless anyone has any more thoughts, let's plan to merge this over the next few days. Thanks again @ahuang11 !","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,842940980
https://github.com/pydata/xarray/pull/5089#issuecomment-813109553,https://api.github.com/repos/pydata/xarray/issues/5089,813109553,MDEyOklzc3VlQ29tbWVudDgxMzEwOTU1Mw==,5635139,2021-04-04T22:35:15Z,2021-04-04T22:35:15Z,MEMBER,"If we don't hear anything, let's add this to the top of the list for the next dev call in ten days","{""total_count"": 1, ""+1"": 1, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,842940980
https://github.com/pydata/xarray/pull/5089#issuecomment-811203549,https://api.github.com/repos/pydata/xarray/issues/5089,811203549,MDEyOklzc3VlQ29tbWVudDgxMTIwMzU0OQ==,5635139,2021-03-31T16:23:22Z,2021-03-31T16:23:22Z,MEMBER,"@pydata/xarray we didn't get to this on the call today — two questions from @mathause :
- should we have `dims=None` default to all dims? Or are we gradually transitioning to `dims=...` for all dims?
- Is `drop_duplicates` a good name? Or should it explicitly refer to dropping duplicates on the _index_?","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,842940980