issue_comments
2 rows where issue = 569176457 and user = 6130352 sorted by updated_at descending
This data as json, CSV (advanced)
Suggested facets: created_at (date), updated_at (date)
issue 1
- Self joins with non-unique indexes · 2 ✖
| id | html_url | issue_url | node_id | user | created_at | updated_at ▲ | author_association | body | reactions | performed_via_github_app | issue | 
|---|---|---|---|---|---|---|---|---|---|---|---|
| 604580582 | https://github.com/pydata/xarray/issues/3791#issuecomment-604580582 | https://api.github.com/repos/pydata/xarray/issues/3791 | MDEyOklzc3VlQ29tbWVudDYwNDU4MDU4Mg== | eric-czech 6130352 | 2020-03-26T17:51:34Z | 2020-03-26T17:51:34Z | NONE | That'll work, thanks @keewis! fwiw the number of use cases I've found concerning my initial question, where there are repeated index values on both sides of the join, is way lower. | {
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
} | Self joins with non-unique indexes 569176457 | |
| 604464873 | https://github.com/pydata/xarray/issues/3791#issuecomment-604464873 | https://api.github.com/repos/pydata/xarray/issues/3791 | MDEyOklzc3VlQ29tbWVudDYwNDQ2NDg3Mw== | eric-czech 6130352 | 2020-03-26T14:32:40Z | 2020-03-26T14:34:34Z | NONE | Hey @mrocklin (cc @max-sixty), sure thing. My original question was about how to implement a join in a typical relational algebra sense, where rows with identical values in the join clause are repeated, but I think I have an even simpler problem that is much more common in our workflows (and touches on how duplicated index values are supported). For example, I'd like to do something like this: ```python import xarray as xr import numpy as np import pandas as pd Assume we have a dataset of 3 individuals, one of Africanancestry and two of European ancestrya = pd.DataFrame({'pop_name': ['AFR', 'EUR', 'EUR'], 'sample_id': [1, 2, 3]}) Join on ancestry to get population sizeb = pd.DataFrame({'pop_name': ['AFR', 'EUR'], 'pop_size': [10, 100]}) pd.merge(a, b, on='pop_name') ``` | | pop_name | sample_id | pop_size | |----|------------|-------------|------------| | 0 | AFR | 1 | 10 | | 1 | EUR | 2 | 100 | | 2 | EUR | 3 | 100 | With xarray, the closest equivalent to this I can find is: ```python a = xr.DataArray( data=[1, 2, 3], dims='x', coords=dict(pop_name=('x', ['AFR', 'EUR', 'EUR'])), name='sample_id' ).set_index(dict(x='pop_name')) <xarray.DataArray 'sample_id' (x: 3)>array([1, 2, 3])Coordinates:* x (x) object 'AFR' 'EUR' 'EUR'b = xr.DataArray( data=[10, 100], dims='x', coords=dict(pop_name=('x', ['AFR', 'EUR'])), name='pop_size' ).set_index(dict(x='pop_name')) <xarray.DataArray 'pop_size' (x: 2)>array([100, 10])Coordinates:* x (x) object 'EUR' 'AFR'xr.merge([a, b]) InvalidIndexError: Reindexing only valid with uniquely valued Index objects``` The above does exactly what I want as long as the population names being used as the coordinate to merge on are unique, but that obviously doesn't make sense if those names correspond to a bunch of individuals in one of a small number of populations. The larger context for this is that genetic data itself is typically some 2+ dimensional array with the first two dimensions corresponding to genomic sites and people. Xarray is perfect for carrying around the extra information relating to those dimensions as coordinates, but being able to attach new coordinate values by joins to external tables is important. Am I missing something obvious in the API that will do this? Or am I likely better off converting DataArrays to DFs, doing my operations with some DF api, and then converting back? | {
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
} | Self joins with non-unique indexes 569176457 | 
Advanced export
JSON shape: default, array, newline-delimited, object
CREATE TABLE [issue_comments] (
   [html_url] TEXT,
   [issue_url] TEXT,
   [id] INTEGER PRIMARY KEY,
   [node_id] TEXT,
   [user] INTEGER REFERENCES [users]([id]),
   [created_at] TEXT,
   [updated_at] TEXT,
   [author_association] TEXT,
   [body] TEXT,
   [reactions] TEXT,
   [performed_via_github_app] TEXT,
   [issue] INTEGER REFERENCES [issues]([id])
);
CREATE INDEX [idx_issue_comments_issue]
    ON [issue_comments] ([issue]);
CREATE INDEX [idx_issue_comments_user]
    ON [issue_comments] ([user]);
user 1