home / github / issue_comments

Menu
  • GraphQL API
  • Search all tables

issue_comments: 419166914

This data as json

html_url issue_url id node_id user created_at updated_at author_association body reactions performed_via_github_app issue
https://github.com/pydata/xarray/issues/2399#issuecomment-419166914 https://api.github.com/repos/pydata/xarray/issues/2399 419166914 MDEyOklzc3VlQ29tbWVudDQxOTE2NjkxNA== 514522 2018-09-06T16:56:44Z 2018-09-06T16:56:44Z CONTRIBUTOR

Thanks for the feedback!

  1. You can count on indexing if the is_unique flag is checked beforehand. The way pandas does indexing seems to be both clear to the user and powerful. It seems clear because indexing is the result of a Cartesian product after filtering for matching values. It is powerful because it allows indexing as complex as SQL INNER JOIN, which covers the trivial case of unique elements. For example, the following operation

```python import pandas as pd

df = pd.DataFrame(data=[0, 1, 2], index=list("aab")) print(df.loc[list("ab")])

0

a 0

a 1

b 2

```

is an INNER JOIN between the two indexes

INNER((a, b) x (a, a, b)) = INNER(aa, aa, ab, ba, ba, bb) = (aa, aa, bb)

Another example:

```python import pandas as pd

df = pd.DataFrame(data=[0, 1], index=list("aa")) print(df.loc[list("aa")])

0

a 0

a 1

a 0

a 1

```

is again an INNER JOIN between the two indexes

INNER((a, a) x (a, a)) = INNER(aa, aa, aa, aa) = (aa, aa, aa, aa)

  1. Assume a bidimensional array with the following indexing:

0 1 a ! @ a # $

This translate into an unidimensional index: (a, 0), (a, 1), (a, 0), (a, 1). As such, it can be treated as usual. Assume you index the above matrix using [('a', 0), ('a', 0)]. This implies

INNER( ((a, 0), (a, 0)) x ((a, 0), (a, 1), (a, 0), (a, 1)) ) = INNER( (a,0)(a,0), (a,0)(a,1), (a,0)(a,0), (a,0)(a,1), (a,0)(a,0), (a,0)(a,1), (a,0)(a,0), (a,0)(a,1) ) = ((a,0)(a,0), (a,0)(a,0), (a,0)(a,0), (a,0)(a,0))

Converting it back to the matricial representation:

0 0 a ! ! a # #

In summary, my suggestion is to consider the possibility of defining indexing B by using A (i.e., B.loc(A)) as a Cartesian product followed by match filtering. Or in SQL terms, an INNER JOIN.

The multi-dimensional indexing, as far as I can see, can always be transformed into the uni-dimensional case and treated as such.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  357156174
Powered by Datasette · Queries took 0.647ms · About: xarray-datasette