github: issue_comments: 5 rows where issue = 569176457 sorted by updated

5 rows where issue = 569176457 sorted by updated_at descending

Search:

descending

id	html_url	issue_url	node_id	user	created_at	updated_at ▲	author_association	body	reactions	issue
604580582	https://github.com/pydata/xarray/issues/3791#issuecomment-604580582	https://api.github.com/repos/pydata/xarray/issues/3791	MDEyOklzc3VlQ29tbWVudDYwNDU4MDU4Mg==	eric-czech 6130352	2020-03-26T17:51:34Z	2020-03-26T17:51:34Z	NONE	That'll work, thanks @keewis! fwiw the number of use cases I've found concerning my initial question, where there are repeated index values on both sides of the join, is way lower.	{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	Self joins with non-unique indexes 569176457
604537050	https://github.com/pydata/xarray/issues/3791#issuecomment-604537050	https://api.github.com/repos/pydata/xarray/issues/3791	MDEyOklzc3VlQ29tbWVudDYwNDUzNzA1MA==	keewis 14808389	2020-03-26T16:39:38Z	2020-03-26T16:39:38Z	MEMBER	The only way I could come up with is: python In [2]: a = xr.DataArray( ...: name="sample_id", ...: data=[1, 2, 3], ...: dims="population_name", ...: coords={"population_name": ["AFR", "EUR", "EUR"]}, ...: ) ...: b = xr.DataArray( ...: name="population_size", ...: data=[10, 100], ...: dims="population_name", ...: coords={"population_name": ["AFR", "EUR"]}, ...: ) ...: a.to_dataset().assign({b.name: b.sel(population_name=a.population_name)}) Out[2]: <xarray.Dataset> Dimensions: (population_name: 3) Coordinates: * population_name (population_name) <U3 'AFR' 'EUR' 'EUR' Data variables: sample_id (population_name) int64 1 2 3 population_size (population_name) int64 10 100 100 which is a manual join?	{ "total_count": 2, "+1": 2, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	Self joins with non-unique indexes 569176457
604464873	https://github.com/pydata/xarray/issues/3791#issuecomment-604464873	https://api.github.com/repos/pydata/xarray/issues/3791	MDEyOklzc3VlQ29tbWVudDYwNDQ2NDg3Mw==	eric-czech 6130352	2020-03-26T14:32:40Z	2020-03-26T14:34:34Z	NONE	Hey @mrocklin (cc @max-sixty), sure thing. My original question was about how to implement a join in a typical relational algebra sense, where rows with identical values in the join clause are repeated, but I think I have an even simpler problem that is much more common in our workflows (and touches on how duplicated index values are supported). For example, I'd like to do something like this: ```python import xarray as xr import numpy as np import pandas as pd Assume we have a dataset of 3 individuals, one of African ancestry and two of European ancestry a = pd.DataFrame({'pop_name': ['AFR', 'EUR', 'EUR'], 'sample_id': [1, 2, 3]}) Join on ancestry to get population size b = pd.DataFrame({'pop_name': ['AFR', 'EUR'], 'pop_size': [10, 100]}) pd.merge(a, b, on='pop_name') ``` \| \| pop_name \| sample_id \| pop_size \| \|----\|------------\|-------------\|------------\| \| 0 \| AFR \| 1 \| 10 \| \| 1 \| EUR \| 2 \| 100 \| \| 2 \| EUR \| 3 \| 100 \| With xarray, the closest equivalent to this I can find is: ```python a = xr.DataArray( data=[1, 2, 3], dims='x', coords=dict(pop_name=('x', ['AFR', 'EUR', 'EUR'])), name='sample_id' ).set_index(dict(x='pop_name')) <xarray.DataArray 'sample_id' (x: 3)> array([1, 2, 3]) Coordinates: * x (x) object 'AFR' 'EUR' 'EUR' b = xr.DataArray( data=[10, 100], dims='x', coords=dict(pop_name=('x', ['AFR', 'EUR'])), name='pop_size' ).set_index(dict(x='pop_name')) <xarray.DataArray 'pop_size' (x: 2)> array([100, 10]) Coordinates: * x (x) object 'EUR' 'AFR' xr.merge([a, b]) InvalidIndexError: Reindexing only valid with uniquely valued Index objects ``` The above does exactly what I want as long as the population names being used as the coordinate to merge on are unique, but that obviously doesn't make sense if those names correspond to a bunch of individuals in one of a small number of populations. The larger context for this is that genetic data itself is typically some 2+ dimensional array with the first two dimensions corresponding to genomic sites and people. Xarray is perfect for carrying around the extra information relating to those dimensions as coordinates, but being able to attach new coordinate values by joins to external tables is important. Am I missing something obvious in the API that will do this? Or am I likely better off converting DataArrays to DFs, doing my operations with some DF api, and then converting back?	{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	Self joins with non-unique indexes 569176457
598800439	https://github.com/pydata/xarray/issues/3791#issuecomment-598800439	https://api.github.com/repos/pydata/xarray/issues/3791	MDEyOklzc3VlQ29tbWVudDU5ODgwMDQzOQ==	mrocklin 306380	2020-03-13T16:12:53Z	2020-03-13T16:12:53Z	MEMBER	I wonder if there are multi-dimensional analogs that might be interesting. @eric-czech , if you have time to say a bit more about the data and operation that you're trying to do I think it would be an interesting exercise to see how to do that operation with Xarray's current functionality. I wouldn't be surprised to learn that there was some way to do what you wanted that went under a different name here.	{ "total_count": 1, "+1": 1, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	Self joins with non-unique indexes 569176457
595406486	https://github.com/pydata/xarray/issues/3791#issuecomment-595406486	https://api.github.com/repos/pydata/xarray/issues/3791	MDEyOklzc3VlQ29tbWVudDU5NTQwNjQ4Ng==	max-sixty 5635139	2020-03-05T19:32:38Z	2020-03-05T19:32:38Z	MEMBER	Hi @eric-czech -- thanks for the issue. Unfortunately xarray isn't strong as these sort of relational joins, and I don't think there's a way of doing that specific operation. Relational algebra generally depends on data on a single dimension, which fits into xarray's model less well. Feel free to post back here with contiguous questions, though	{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	Self joins with non-unique indexes 569176457

Advanced export

JSON shape: default, array, newline-delimited, object

CREATE TABLE [issue_comments] (
   [html_url] TEXT,
   [issue_url] TEXT,
   [id] INTEGER PRIMARY KEY,
   [node_id] TEXT,
   [user] INTEGER REFERENCES [users]([id]),
   [created_at] TEXT,
   [updated_at] TEXT,
   [author_association] TEXT,
   [body] TEXT,
   [reactions] TEXT,
   [performed_via_github_app] TEXT,
   [issue] INTEGER REFERENCES [issues]([id])
);
CREATE INDEX [idx_issue_comments_issue]
    ON [issue_comments] ([issue]);
CREATE INDEX [idx_issue_comments_user]
    ON [issue_comments] ([user]);

issue_comments

5 rows where issue = 569176457 sorted by updated_at descending

Assume we have a dataset of 3 individuals, one of African

ancestry and two of European ancestry

Join on ancestry to get population size

<xarray.DataArray 'sample_id' (x: 3)>

array([1, 2, 3])

Coordinates:

* x (x) object 'AFR' 'EUR' 'EUR'

<xarray.DataArray 'pop_size' (x: 2)>

array([100, 10])

Coordinates:

* x (x) object 'EUR' 'AFR'

InvalidIndexError: Reindexing only valid with uniquely valued Index objects

Advanced export